Reputation: 931
I want to check if a certain metric is not available in prometheus for 5 minute.
I am using absent(K_KA_GCPP)
and giving a 5 minute threshold. But it seems I cannot group the absent function on certain labels like Site Id.
Absent works if the metric is not available for all 4 site Ids. I want to find out if the metric is not available or absent for 1 site id out of all 4 and I don't want to hardcode the site Id labels in the query, it should be generic. Is there any way I can do that?
Upvotes: 12
Views: 26250
Reputation: 2888
I was able to achieve this by doing something like this:
count(up{job="prometheus"} offset 1h) by (project) unless count(up{job="prometheus"} ) by (project)
If the metric is missing in the last 1 hour, it will trigger an alert.
You can add any labels you need after the by
section (that's helpful in altering for example).
Source: Prometheus Alert for missing metrics and labels
Upvotes: 7
Reputation: 3704
You can use it as a group! see how to configure an alert rule group
You can also use absent_over_time
function
absent returns just one result as it is for a single site ID in your case
absent(<expr>)
Returns an empty vector if the vector passed to it has any elements and a 1-element vector with the value 1 if the vector passed to it has no elements. This is useful for alerting on when no time series exist for a given metric name and label combination.
Upvotes: -1
Reputation: 61
The offset
I feel like is a great starting point, but it has a big weakness. If there's no sample in the time - offset
then your query doesn't return what you'd like to.
I reworked the answer from Ahmed to this:
group(present_over_time(myMetric{label1="asd"}[3h])) by (labels) unless group(myMetric{label1="asd"}) by (labels)
present_over_time
should fix that aforementioned problemgroup()
aggregation, since you don't need the valueup{}
is a state of the scraped target, not the "metric is present" information which I feel might not be equivalentUpvotes: 3