Pradnya Alchetti
Pradnya Alchetti

Reputation: 175

Deactivate prometheus alerts after 10mins

I have a kubernetes cluster and for monitoring and alerts I am using Prometheus. The Prometheus alertmanager keeps repeating the alerts until they are active. I want to configure my alertmanager to send the alert only once to the slack and repeat the alert only if there is a change in the state of the alert

I tried generating alerts only for specific time as follows

(kube_pod_container_status_restarts_total > 3) * ((time() % 86400 / 3600 > bool 3) == bool (time() % 86400 / 3600 < bool 4))

but this didn't work for me

Prometheus server config is as follows:

prometheus-server.yml

alert: PodRestartAlert
        expr: kube_pod_container_status_restarts_total >3
        for: 5m
        labels:
          severity:
        annotations:
          description: ""
          summary: 'The pods that are restarted more than 3 times'

Alertmanager config is as follows:

global:
      slack_api_url: "http://"
    receivers:
    - name: default-receiver
      slack_configs:
      - channel: '#abc'
        text: Prometheus Alert generated
    route:
      group_by:
      - alertname
      - datacenter
      - app
      group_interval: 5m
      receiver: default-receiver
      repeat_interval: 0

I am trying to achieve something as below:

If initially there are 10 pods which have been restarted more than 3 times then it should throw an alert only once to the slack

If after a day or two if the number of restarted pods increase to 20 pods only then the alert manager should throw an alert to slack that too only once

Any suggestions on what I could try or change would be of great help

Thanks in advance!

Upvotes: 2

Views: 3330

Answers (1)

Eduardo Baitello
Eduardo Baitello

Reputation: 11346

You can use the slack_configs.title to create a generic message, and them slack_configs.text to range through firing/resolved alerts, so you can receive multiple alerts of the same type in a single Slack message.

Also, make sure to set slack_configs.send_resolved: true to be notified about resolved alerts.

e.g.:

alertmanager:
  config:
    global:
      resolve_timeout: 5m
    route:
      group_by:
        - alertname
        - datacenter
        - app
      group_wait: 30s
      group_interval: 5m
      repeat_interval: 12h
      receiver: 'default-receiver'
    receivers:
    - name: 'default-receiver'
      slack_configs:
      - channel: '#abc'
        send_resolved: true
        title: '[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] Monitoring Event Notification'
        text: |-
          {{ range .Alerts }}
            *Alert:* {{ .Labels.alertname }} - `{{ .Labels.severity }}`
            *Description:* {{ .Annotations.summary }}
            *Graph:* <{{ .GeneratorURL }}|:chart_with_upwards_trend:> *Runbook:* <{{ .Annotations.runbook_url }}|:spiral_note_pad:>
            *Details:*
            {{ range .Labels.SortedPairs }} • *{{ .Name }}:* `{{ .Value }}`
            {{ end }}
          {{ end }}

The above example also ranges through the .Labels.SortedPairs, creating a complete "Details:" section with everything involving the firing alert.

The alerts should look like this: slack_alert_example


Further Reading:

Upvotes: 1

Related Questions