Reputation: 2580
I have an alert in my Prometheus set up that sends an alert when someMetric > 100
has been valid for 5m
and then resends the alert every 24h
according to the configuration below:
prometheus-alert.yml
- alert: TestAlert
expr: someMetric > 100
for: 5m
alertmanager-config.yml
repeat_interval: 24h
However someMetric
has a behaviour where it can be "stable" above 100 (which means an alert is active) but every once in a while it drops to something below 100 for a single scraping before jumping back up above 100. This will cause an active alert to become inactive (resolved) then back to pending and active again after 5 min. This will cause Prometheus to resend the alert which is what I want to avoid.
Is there a way to configure Prometheus to have something similar to for: 5m
, but for the transiction active -> inactive (resolved)?
Upvotes: 4
Views: 3145
Reputation: 842
You could use one of the aggregation-over-time promQL functions to 'filter out' the blips that dip below 100, in your example? In your case it sounds like max might work? The only down-side being that it could take a few minutes longer to end the alert once the value drops permanently below 100.
- alert: TestAlert
expr: max_over_time(someMetric[2m]) > 100
for: 5m
Upvotes: 1