Reputation: 83
I have a problem with inhibition rules because of exceptions to inhibition rules.
For example, we have 50 teams, and every team handles their alerts. When a data center goes down (e.g. because of network problems), we want to inhibit all alerts except for Team_1234567890 and Team_ABCDEFGHIJ.
Problem is that Alertmanager doesn't support negative matchers for inhibition: Negative matchers for routing and inhibition #1023 - https://github.com/prometheus/alertmanager/issues/1023
Golang, and Prometheus/Alertmaneger, doesn't support "?!" negative look ahead in regex: https://github.com/google/re2/wiki/Syntax
How to setup inhibition rules for this example?
Thanks, Denis
Upvotes: 2
Views: 2577
Reputation: 300
I had similar casus, it turned out that - job: "!(dev_mapr_alarms_exporters)" - for my specific case did the job. I was able to segregate these 2 groups. Here is a part of my config.
routes:
- receiver: "jiralert"
group_wait: 10s
match_re:
severity: critical|warning
job: "!(dev_mapr_alarms_exporters)"
group_by: ['alertname', 'job']
group_interval: 5m
repeat_interval: 30m
continue: true
- receiver: "jiralert"
group_wait: 10s
match_re:
job: dev_mapr_alarms_exporters
group_by: ['alertname', 'job']
group_interval: 5m
repeat_interval: 30m
continue: true
Upvotes: 0
Reputation: 83
Julien Pivotto (roidelapluie/Github) has written solution to this use case: https://github.com/prometheus/alertmanager/issues/1023#issuecomment-671851280
You could use prometheus
alerting:
alert_relabel_configs:
- source_labels: [team]
regex: Team_1234567890|Team_ABCDEFGHIJ
target_label: dc_team_alert
replacement: "yes"
and inhibit
target_match:
dc_team_alert: ""
Upvotes: 0
Reputation: 1204
Before negative match implemented in AM, you need add unique routes for those two teams. And inhibit other teams as normal.
Or, if you want to go with Silencer route, https://github.com/prometheus/alertmanager/blob/master/README.md#amtool
More detailed man page can be found here https://manpages.debian.org/testing/prometheus-alertmanager/amtool.1.en.html
You can add a silencer using amtool to snooze all alerts for the other 50-2 teams as soon as the first network down alert being triggered.
You DO need to be creative about when to insert / remove the Silencer.
Unless you already had a list of teams who don’t want to be alert-stormed, you DO need run a negative match PromQL to return those 48 team names and separate them by |,
amtool silence add alertname=~”.*” instance=~"team1|team2...”
Upvotes: 1