Calculating availability in Prometheus based on response value

Question

I am trying to calculate the availability of elasticsearch using prometheus. One of the jobs that runs get the cluster status as a value, being either 0, 1 or 2 where anything above 1 is considered unavailable. Using the answer from here does not work due to all the jobs succeeding and so the query has to do something along the lines of:

avg_over_time(es_cluster_status{cluster="name", instance="my_es"}>1[24h])

This does however not work due to the >1.

Alin S&#238;npălean · Accepted Answer

Prometheus does not support filtering samples in range vectors, the >1 would only work for filtering vectors based on their instant value.

The simplest workaround is for you to define a recorded rule that would behave just like the up metric does (0 when your target is down, 1 otherwise). Something like es_cluster_status{cluster="name", instance="my_es"} <= 1. Then you could apply avg_over_time() on that metric and get the availability over any given range.

Calculating availability in Prometheus based on response value

Answers (1)

Related Questions