The problem of alert fatigue is considered to be the #1 pain point for both traditional IT teams as well as modern DevOps engineers. Especially for those who provide operational support for their applications and production infrastructure.

And with increased adoption of cloud and emergence of micro services architecture for building new generation systems, we are quadrupling the amount of metrics monitored, like server metrics, container metrics, app/web/DB server metrics, application metrics. Why? Due to monitoring hell — the need to monitor more things than we used to do in the traditional world. And this problem of alerts hell is only going to increase exponentially.

Undoubtedly, DevOps is maturing and there are a plethora of alert email management tools available. However, due to these alerts overload (especially for non-critical events or events for which no action is required), engineers are becoming numb to them. The ‘crying wolf syndrome,’ steps upon them where in they start ignoring even critical warnings, thinking they are meaningless alerts. Thus, the whole objective of sending alert emails becomes least effective.

To this end, Botmetric analyzed what DevOps and Operational Engineers want in exchange of these alert emails? And we unearthed few interesting facts:

  • Ability to understand signal over noise
  • Need for scope aware alerting to reduce the flood
  • Dire need for alerts intelligence and event diagnostics over emails
  • Requirement of alert event remediation with workflow handlers

To know further and to delve deep into it, read this post by Botmetric CEO Vijay Rayapati. The post will throw light on what DevOps and Operational Engineers want in exchange of these alert emails, and how DevOps intelligence can be used to fix alerts hell in the cloud world.

--

--

Nutanix
Nutanix

Written by Nutanix

We make infrastructure invisible, elevating IT to focus on the applications and services that power their business.

No responses yet