Reputation: 539
I'm setting prometheus alert (using elasticsearch_exporter) for 2 elasticsearch clusters, 1 with 8 nodes and 1 with 3 node. What I want is to send alert when each cluster lost 1 node, but for now all rules apply for both clusters. So it's not possible.
prometheus.yml file
global:
scrape_interval: 10s
rule_files:
- alert.rules.yml
alerting:
alertmanagers:
- static_configs:
- targets:
- localhost:9093
scrape_configs:
- job_name: cluster1
scrape_interval: 30s
scrape_timeout: 30s
metrics_path: "/metrics"
static_configs:
- targets: ['xxx1:9114' ]
labels:
service: cluster1
- job_name: cluster2
scrape_interval: 30s
scrape_timeout: 30s
metrics_path: "/metrics"
static_configs:
- targets: ['xxx2:9114' ]
labels:
service: cluster2
alert.rules.yml file:
groups:
- name: alert.rules
rules:
- alert: ElasticsearchLostNode
expr: elasticsearch_cluster_health_number_of_nodes < 8
for: 1m
labels:
severity: warning
annotations:
summary: Elasticsearch Healthy Nodes (instance {{ $labels.instance }})
description: Number Healthy Nodes less than 8
...
Ofc the number_of_nodes < 8 will always be true for small cluster, and if I set < 3, the alert will not triggered when big cluster lost 1 node.
Is there a way to exempt 1 specific rule for 1 specific job_name, or define these rules A applying for 1 specific job_name A, these rules B applying for 1 specific job_name B?
Upvotes: 0
Views: 2092
Reputation: 22321
Yes, you can create one rule for each job at the alert.rules.yml file:
groups:
- name: alert.rules
rules:
- alert: ElasticsearchLostNode1
expr: elasticsearch_cluster_health_number_of_nodes{job="cluster1"} < 8
...
- alert: ElasticsearchLostNode2
expr: elasticsearch_cluster_health_number_of_nodes{job="cluster2"} < 3
...
Upvotes: 3