Denis
Denis

Reputation: 83

Inhibition with exceptions (negative matchers)

I have a problem with inhibition rules because of exceptions to inhibition rules.

For example, we have 50 teams, and every team handles their alerts. When a data center goes down (e.g. because of network problems), we want to inhibit all alerts except for Team_1234567890 and Team_ABCDEFGHIJ.

Problem is that Alertmanager doesn't support negative matchers for inhibition: Negative matchers for routing and inhibition #1023 - https://github.com/prometheus/alertmanager/issues/1023

Golang, and Prometheus/Alertmaneger, doesn't support "?!" negative look ahead in regex: https://github.com/google/re2/wiki/Syntax

How to setup inhibition rules for this example?

Thanks, Denis

Upvotes: 2

Views: 2577

Answers (3)

Martin Nikolov
Martin Nikolov

Reputation: 300

I had similar casus, it turned out that - job: "!(dev_mapr_alarms_exporters)" - for my specific case did the job. I was able to segregate these 2 groups. Here is a part of my config.

  routes:
- receiver: "jiralert"
  group_wait: 10s
  match_re:
    severity: critical|warning
    job: "!(dev_mapr_alarms_exporters)"
  group_by: ['alertname', 'job']
  group_interval: 5m
  repeat_interval: 30m
  continue: true
- receiver: "jiralert"
  group_wait: 10s
  match_re:
    job: dev_mapr_alarms_exporters
  group_by: ['alertname', 'job']
  group_interval: 5m
  repeat_interval: 30m
  continue: true

Upvotes: 0

Denis
Denis

Reputation: 83

Julien Pivotto (roidelapluie/Github) has written solution to this use case: https://github.com/prometheus/alertmanager/issues/1023#issuecomment-671851280

You could use prometheus

alerting:
  alert_relabel_configs:
  - source_labels: [team]
    regex: Team_1234567890|Team_ABCDEFGHIJ
    target_label: dc_team_alert
    replacement: "yes"

and inhibit

target_match:
   dc_team_alert: ""

Upvotes: 0

Hang
Hang

Reputation: 1204

Before negative match implemented in AM, you need add unique routes for those two teams. And inhibit other teams as normal.

Or, if you want to go with Silencer route, https://github.com/prometheus/alertmanager/blob/master/README.md#amtool

More detailed man page can be found here https://manpages.debian.org/testing/prometheus-alertmanager/amtool.1.en.html

You can add a silencer using amtool to snooze all alerts for the other 50-2 teams as soon as the first network down alert being triggered.

You DO need to be creative about when to insert / remove the Silencer.

Unless you already had a list of teams who don’t want to be alert-stormed, you DO need run a negative match PromQL to return those 48 team names and separate them by |,

amtool silence add alertname=~”.*” instance=~"team1|team2...”

Upvotes: 1

Related Questions