youngprog
youngprog

Reputation: 31

Prometheus alerting on 2 consecutive failures

I have a metric which has a label called status which tells me if a job was successful or a failure.

I would like to build an alert that tells me if I get two consecutive failures in a row regardless of time between the failures.

Upvotes: 1

Views: 1568

Answers (1)

Petar Nikolov
Petar Nikolov

Reputation: 319

Alerting on 2 consecutive failures will be very tricky because prometheus has essentially 2 different timers for scraping metrics and for evaluating alerts, which are not always in sync, explained here. The way that you can kind of work around it is to tune your query depending on the scrape interval and set the count_over_time to twice the time of your scrape, so that you always get at least 2 series of metrics if your scrape_interval is 15s the query should look something like this:

count_over_time(metric_name{status="fail"}[30s]) > 1

But even this has one shortcoming where if the evaluation and the scrape timer sync in a specific way, in the past 30s you would have essentially 3 scrapes and they could be (fail>success>fail) which would also trigger the alert. I don't think this will happen often if at all but you should still consider it a possibility

Upvotes: 1

Related Questions