Reputation: 2148
I have many servers that monitors with Prometheus, every host has the same metrics.
I need an alert rule that alerts when specific metric(such as some_metrics
) missing on specific host after 5m.
I checked absent
and absent_over_time
but these functions do not return the labels of missing metric such as ip
or hostname
.
Also I should state that I don't want to create a rule for each host.
I have searched about it but I don't find any solution.
Is there any workaround?
Upvotes: 2
Views: 6076
Reputation: 17784
You can try using something like the following promql query:
(some_metric offset 5m) unless some_metric
It will return some_metric
metrics with all their labels, which had a value 5 minutes ago, but have no new values now.
This query uses the following PromQL features:
P.S. the query can be simplified to lag(some_metric[1h]) > 5m
when using VictoriaMetrics - an alternative Prometheus-like monitoring system I work on. See docs for lag() function.
Upvotes: 1
Reputation: 6863
In order to get the labels, you need a metric which has all the labels you want. Usually, a good choice is up
which also distinguish between a missing metric and an unreachable target.
The rule will alert if up (on a job) is 1 and the UNLESS
binary operator will disable the alert if the metric is present on the instance:
- alert: MissingMetricInFooTarget
rule: up{job="foo"} == 1 UNLESS ON(instance) some_metrics{job="foo"}
Upvotes: 5