Alert for Missing Data on Single Prometheus Metric in Grafana Query

Question

Prometheus is providing me with some metrics for a queuing service (beanstalkd) via calls to a separate metrics provider (beanstalkd-exporter). A few times a day, I will notice that there is missing data for some of the queues.

There are a lot of queues, so I gather them all in a few graphs, queries for which might look like this:

tube_current_jobs_ready{tube=~".*some_suffix"}

This will get me all the metrics (queues) ending with "some_suffix". One or more of these — but not all — will sometimes have no data, as in a gap in the graph, not zero, but no data at all (presume that the whys and hows of that happening are out of scope for this question).

I already have alerts for when there is no data for the query, and they trigger when all the metrics returned are null, as expected. What I need is an alert for when there is no data for one or more of the metrics returned by the query.

valyala · Accepted Answer

Try the following query for the alert:

count_over_time(tube_current_jobs_ready{tube=~".*some_suffix"}[D]) < N

This query returns the matching time series where the number of raw samples over the previous duration D is less than N. Parameters D and N must be chosen based on the expected interval between raw samples per each time series (aka scrape_interval in Prometheus ecosystem). For example, the following query should return time series where the number of samples over the last 5 minutes is less than 4:

count_over_time(tube_current_jobs_ready{tube=~".*some_suffix"}[5m]) < 4

Alert for Missing Data on Single Prometheus Metric in Grafana Query

Answers (1)

Related Questions