Fix the Alert Fatigue Mess with DevOps Intelligence

2 min readDec 5, 2016

The problem of alert fatigue is considered to be the #1 pain point for both traditional IT teams as well as modern DevOps engineers. Especially for those who provide operational support for their applications and production infrastructure.

And with increased adoption of cloud and emergence of micro services architecture for building new generation systems, we are quadrupling the amount of metrics monitored, like server metrics, container metrics, app/web/DB server metrics, application metrics. Why? Due to monitoring hell — the need to monitor more things than we used to do in the traditional world. And this problem of alerts hell is only going to increase exponentially.

Undoubtedly, DevOps is maturing and there are a plethora of alert email management tools available. However, due to these alerts overload (especially for non-critical events or events for which no action is required), engineers are becoming numb to them. The ‘crying wolf syndrome,’ steps upon them where in they start ignoring even critical warnings, thinking they are meaningless alerts. Thus, the whole objective of sending alert emails becomes least effective.

To this end, Botmetric analyzed what DevOps and Operational Engineers want in exchange of these alert emails? And we unearthed few interesting facts:

Ability to understand signal over noise
Need for scope aware alerting to reduce the flood
Dire need for alerts intelligence and event diagnostics over emails
Requirement of alert event remediation with workflow handlers

To know further and to delve deep into it, read this post by Botmetric CEO Vijay Rayapati. The post will throw light on what DevOps and Operational Engineers want in exchange of these alert emails, and how DevOps intelligence can be used to fix alerts hell in the cloud world.

Fix the Alert Fatigue Mess with DevOps Intelligence

Written by Nutanix

No responses yet