Reputation: 31
we have several alerts and we want to combine these alerts to one big alert for CPU, Memory and Disk IO.
For example:
rules:
- alert: OutOfMemory
annotations:
description: "Node memory is filling up (< 5% left)\n VALUE = {{ $value }}"
summary: Out of memory (instance {{ $labels.instance }})
expr: node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100 < 5
for: 5m
labels:
severity: warning
and
- alert: HighCpuLoad
annotations:
description: "CPU load is > 90%\n VALUE = {{ $value }}"
summary: High CPU load (instance {{ $labels.instance }})
expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
for: 5m
labels:
severity: warning
We can't figure out how those alerts would look like combined with the operator "and" plus vectoring. Can someone help us out here?
Best regards
Upvotes: 0
Views: 2834
Reputation: 6863
You would have to use the vector matching instruction which, in brief and in simple cases such as yours, translates to indicate which labels should match on both sides of the operator.
In the case of the node exporter it would be:
(<OutOfMemory expression>) AND ON(instance) (<HighCpuLoad expression>)
From a usability point of view, I would rather have multiple alerts which are not sent to your alerting system (use a black hole in alertmanager) and then use the ALERTS
metric to trigger you big alert. It will allow you to have:
OR
clause)for
statements - you may not want to have the same for
for high cpu and memory outage.I have not tested it but it would look like the following:
rules:
- alert: NodeInTrouble
expr: sum(ALERTS{alertname=~"OutOfMemory|HighCpuLoad"}) BY (instance) == 2
for: 1m
labels:
severity: warning
Upvotes: 1