Vikas Rathore
Vikas Rathore

Reputation: 8801

How to get number of nodes running in prometheus

I am trying to create an alert in which, when the number of nodes increase or decrease I'll send an alert.

currently, I can get the node's count using this:

count(kube_node_info)

but I want to get the node's count for 5 mins and 1min so that I can do the subtraction and sent the alert. I don't know how to get the count of nodes for a 5m time frame.

Upvotes: 1

Views: 11859

Answers (4)

snowpeak
snowpeak

Reputation: 867

count(count by (node) (kube_node_info))

Upvotes: 0

Kislow
Kislow

Reputation: 56

In your case, I would create two different alerts.

(sum(kube_node_info) > sum(kube_node_info offset 1d)) # more nodes than 1 day ago
(sum(kube_node_info) < sum(kube_node_info offset 1d)) # less nodes than 1 day ago

Alternatively, you could do something like:

# alert --> "The number of nodes has changed"
(sum(kube_node_info) > sum(kube_node_info offset 1d)) or (sum(kube_node_info) < sum(kube_node_info offset 1d))

Meanwhile, kube_node_info is delivered by kube-state-metrics, meaning that these metrics are generated by the Kubernetes API (see List of nodes instance prometheus). Another way to monitor the number of running nodes might be as follows:

# alert --> "less nodes than 1 day ago"
(count(up{instance=~"node.*"}) < count(count by (node) (min_over_time(up{instance=~"node.*"}[1d]))))

Upvotes: 2

Shwan
Shwan

Reputation: 1

Does the following work?

abs(sum(kube_node_info) - sum(kube_node_info offset 1m)) > 0

Upvotes: 0

You can use either the delta function or the offset modifier to get what you want.

Upvotes: 0

Related Questions