Yuval
Yuval

Reputation: 834

Sum duration when metric was above/below threshold in Prometheus

Consider a counter my_counter with a label success. I created a success rate metric with this query: rate(my_counter[10m]{success="true"})/rate(my_counter[10m]).

Now I want to know how much time the rate was below a certain threshold. I have a Grafana dashboard with Prometheus as its data source. With Grafana I can easily pick a time range, but I still need a way to sum the time where my condition applies.

Any ideas?

Upvotes: 4

Views: 3755

Answers (1)

Michael Doubez
Michael Doubez

Reputation: 6863

There are four parts to your question:

  1. have an indicator that takes value 1 when a condition is met - this is done by using the BOOL operator of comparison operators.

    rate(something[5m]) > BOOL 0.99

  2. compute the number of time the condition is met - this is done by using the avg_over_time function:

    avg_over_time(condition[1d])

  3. get everything in a single query - you need to use recoding rules or have a prometheus version that support subqueries

  4. and the last is to use grafana to fill in the time frame to get either the ratio or the duration the condition was met.

    avg_over_time( condition[$__range] )

    avg_over_time( condition[$__range] ) * $__range_s

Putting it all together is a hard-to-read expression. You may do better using the -- Grafana Dashboard -- source available in recent versions:

avg_over_time( (rate(something[5m]) > BOOL 0.99)[$__range:] )

Upvotes: 4

Related Questions