Gijs de Jong
Gijs de Jong

Reputation: 1027

Is it possible to create a Grafana alert for any unhealthy Prometheus Consul targets?

Prometheus can be setup to collect metrics for Consul targets.

The Targets page of Prometheus shows an overview of the configured targets, including a count of the number of healthy/total targets (in the example below there are 20 healthy targets and 22 total targets)

Is there any way to create an alert in Grafana to trigger when not all targets are healthy? In the example below the alert should trigger since not all 22 targets are up.

I have found prometheus_sd_discovered_targets which contains the total amount of targets, but there does not seem to be a metric that exposes the number of healthy targets.

enter image description here

Upvotes: 0

Views: 1713

Answers (1)

Gijs de Jong
Gijs de Jong

Reputation: 1027

As pointed out by Raven the up metric can be used for this.

From the docs:

For each instance scrape, Prometheus stores a sample in the following time series:

up{job="<job-name>", instance="<instance-id>"}: 1 if the instance is healthy, i.e. reachable, or 0 if the scrape failed.

The up time series is useful for instance availability monitoring.

A Prometheus query like up < 1 gives you the targets that are currently unhealthy.

From that you can create a Grafana Alert with parameters like

  • when last() of query (A, 5m, now) is above -1
  • If no data or all values are null set state to Ok

Upvotes: 3

Related Questions