Reputation: 1027
Prometheus can be setup to collect metrics for Consul targets.
The Targets page of Prometheus shows an overview of the configured targets, including a count of the number of healthy/total targets (in the example below there are 20 healthy targets and 22 total targets)
Is there any way to create an alert in Grafana to trigger when not all targets are healthy? In the example below the alert should trigger since not all 22 targets are up.
I have found prometheus_sd_discovered_targets
which contains the total amount of targets, but there does not seem to be a metric that exposes the number of healthy targets.
Upvotes: 0
Views: 1713
Reputation: 1027
As pointed out by Raven the up
metric can be used for this.
From the docs:
For each instance scrape, Prometheus stores a sample in the following time series:
up{job="<job-name>", instance="<instance-id>"}: 1 if the instance is healthy, i.e. reachable, or 0 if the scrape failed.
The up time series is useful for instance availability monitoring.
A Prometheus query like up < 1
gives you the targets that are currently unhealthy.
From that you can create a Grafana Alert with parameters like
when last() of query (A, 5m, now) is above -1
If no data or all values are null set state to Ok
Upvotes: 3