Arash Mousavi
Arash Mousavi

Reputation: 2148

alerting missing metric for many hosts in alertmanager

I have many servers that monitors with Prometheus, every host has the same metrics.

I need an alert rule that alerts when specific metric(such as some_metrics) missing on specific host after 5m.

I checked absent and absent_over_time but these functions do not return the labels of missing metric such as ip or hostname.

Also I should state that I don't want to create a rule for each host.

I have searched about it but I don't find any solution.

Is there any workaround?

Upvotes: 2

Views: 6076

Answers (2)

valyala
valyala

Reputation: 17784

You can try using something like the following promql query:

(some_metric offset 5m) unless some_metric

It will return some_metric metrics with all their labels, which had a value 5 minutes ago, but have no new values now.

This query uses the following PromQL features:

  • offset modifier for querying the data from the past
  • unless operator for returning only time series, which have been disappeared during the last 5 minutes

P.S. the query can be simplified to lag(some_metric[1h]) > 5m when using VictoriaMetrics - an alternative Prometheus-like monitoring system I work on. See docs for lag() function.

Upvotes: 1

Michael Doubez
Michael Doubez

Reputation: 6863

In order to get the labels, you need a metric which has all the labels you want. Usually, a good choice is up which also distinguish between a missing metric and an unreachable target.

The rule will alert if up (on a job) is 1 and the UNLESS binary operator will disable the alert if the metric is present on the instance:

- alert: MissingMetricInFooTarget
  rule: up{job="foo"} == 1 UNLESS ON(instance) some_metrics{job="foo"}

Upvotes: 5

Related Questions