Reputation: 8801
I am trying to create an alert in which, when the number of nodes increase or decrease I'll send an alert.
currently, I can get the node's count using this:
count(kube_node_info)
but I want to get the node's count for 5 mins and 1min so that I can do the subtraction and sent the alert. I don't know how to get the count of nodes for a 5m time frame.
Upvotes: 1
Views: 11859
Reputation: 56
In your case, I would create two different alerts.
(sum(kube_node_info) > sum(kube_node_info offset 1d)) # more nodes than 1 day ago
(sum(kube_node_info) < sum(kube_node_info offset 1d)) # less nodes than 1 day ago
Alternatively, you could do something like:
# alert --> "The number of nodes has changed"
(sum(kube_node_info) > sum(kube_node_info offset 1d)) or (sum(kube_node_info) < sum(kube_node_info offset 1d))
Meanwhile, kube_node_info
is delivered by kube-state-metrics
, meaning that these metrics are generated by the Kubernetes API (see List of nodes instance prometheus). Another way to monitor the number of running nodes might be as follows:
# alert --> "less nodes than 1 day ago"
(count(up{instance=~"node.*"}) < count(count by (node) (min_over_time(up{instance=~"node.*"}[1d]))))
Upvotes: 2
Reputation: 1
Does the following work?
abs(sum(kube_node_info) - sum(kube_node_info offset 1m)) > 0
Upvotes: 0
Reputation: 22331
You can use either the delta function or the offset modifier to get what you want.
Upvotes: 0