Reputation: 2932
We had a nice Prometheus Query involving the load of a node and its number of cpu cores:
(avg(node_load1{instance=~"$instance"}) by (instance) / sum(machine_cpu_cores{instance=~"$instance"}) by (instance)) * 100
But the "machine_cpu_cores" has gone away and instead I found "kube_node_status_capacity{resource="cpu"}. I want to be able to group by nodes and and also be able to filter by nodes.
The problem is that on the metric "node_load1" the label for a node is "instance" and on the metric "kube_node_status_capacity" it is "node". I could not do the desired division operation on mismatching labels. The solution I found was to use 'label_replace' on one metric:
(avg(label_replace(node_load1{instance=~"$instance"}, "node", "$1", "instance", "(.*)")) by (node) / sum(kube_node_status_capacity{resource="cpu", node=~"$instance"}) by (node)) * 100
Is this a good solution? Are there better solutions?
Upvotes: 1
Views: 2510
Reputation: 56
Sure, using a label_replace
is absolutely appropriate. Your labels are coming from different sources, and are therefore using different label names for roughly the same concept. This is one occasion where label_replace
is very handy!
There is a different approach for getting CPU count which is quite common. You can use a query like this:
count by (instance) (node_cpu_seconds_total{mode="idle"})
Since node-exporter creates a node_cpu_seconds_total
time series for each CPU/mode, its count per instance for a specific mode (here we use "idle" arbitrarily) is the count of CPUs.
I'm not sure whether kube_node_status_capacity
represents the total CPU count for the node, or just the CPU allocated to run Kubernetes pods (a node can reserve some CPU). So using this query might be more accurate, depending on your needs.
Upvotes: 1