Incident Management: Grafana OnCall

Yusuf Tayman
3 min readNov 21, 2021

Incident management refers to a set of practices, processes, and solutions that enable teams to detect, investigate and respond to incidents. It is a critical element for businesses of all sizes and a requirement to meet most data compliance standards.

Incident management processes enable teams to quickly be aware of and address security vulnerabilities or issues. Faster responses help reduce the overall impact of incidents, reduce damages, and ensure systems and services continue to operate as planned.

Without incident management, you could lose valuable data, experience reduced productivity and revenue due to downtime, or be held liable for breaches of service level agreements (SLAs). Even when events are insignificant and do not cause permanent damage, these issues should be addressed because can foreshadow future errors.

Most companies provide the incident management method with different tools. A separate tool is used for each step. We make the process even more difficult by using different tools for monitoring, alerting, and sending messages.

Let’s leave them all aside because I will not describe the tools separately. I am here with a service where you can manage all of them from one place, Grafana OnCall.

What is the Grafana OnCall?

Grafana OnCall is an Incident Management tool that you can use over Grafana. So what can we do together with this tool;

* Incident Management
* Incident Communication
* Incident Escalation

In short, it offers you an end-to-end incident management opportunity as soon as the incident occurs. Let’s take a closer look at the Grafana OnCall’s features.

Incident Management

Grafana OnCall allows us to manage Incident Management incidents from one place, and it does so with the excellent Grafana interface.

As you can see below, it supports not only the alerts you manage via Grafana but also your integrations in other tools. Thus, you can easily transition to Incident Management.

Also can group your Incidents as you wish, or you can put an end to the information pollution in your incident channels and mute them with the silence feature. In addition, you can easily respond to multiple incidents by listing them according to your integrations.

Incident Communications

The most important point in Incident Communications is to inform the users as soon as an incident occurs and to communicate within the team. This process needs to be addressed quickly and to the right person.

User-based notifications provided by Grafana OnCall can be defined, such as send a mention on Slack when an incident occurs, wait x minutes, then send an sms or call.

And Oscar Goes to the Incident Escalation

With the Escalation feature, incident alerts of your different integrations can be addressed to the relevant slack channels. Whichever team is interested in which integration, you can direct the notifications to those people, and if no action has been taken, you can send notifications to different people, so you can easily manage all your integrations.

And as you can see, you can use all these features in a very simple way. 🤓

For more; https://grafana.com/docs/grafana-cloud/oncall/

--

--