Ronald
Ronald

Reputation: 2932

Is it a good solution to use "label_replace" in a prometheus query when doing math operations on two metrics with different labels for the same value

We had a nice Prometheus Query involving the load of a node and its number of cpu cores:

(avg(node_load1{instance=~"$instance"}) by (instance) / sum(machine_cpu_cores{instance=~"$instance"}) by (instance)) * 100

But the "machine_cpu_cores" has gone away and instead I found "kube_node_status_capacity{resource="cpu"}. I want to be able to group by nodes and and also be able to filter by nodes.

The problem is that on the metric "node_load1" the label for a node is "instance" and on the metric "kube_node_status_capacity" it is "node". I could not do the desired division operation on mismatching labels. The solution I found was to use 'label_replace' on one metric:

(avg(label_replace(node_load1{instance=~"$instance"}, "node", "$1", "instance", "(.*)")) by (node) / sum(kube_node_status_capacity{resource="cpu", node=~"$instance"}) by (node)) * 100

Is this a good solution? Are there better solutions?

Upvotes: 1

Views: 2510

Answers (1)

Jake Utley
Jake Utley

Reputation: 56

Sure, using a label_replace is absolutely appropriate. Your labels are coming from different sources, and are therefore using different label names for roughly the same concept. This is one occasion where label_replace is very handy!

There is a different approach for getting CPU count which is quite common. You can use a query like this:

count by (instance) (node_cpu_seconds_total{mode="idle"})

Since node-exporter creates a node_cpu_seconds_total time series for each CPU/mode, its count per instance for a specific mode (here we use "idle" arbitrarily) is the count of CPUs.

I'm not sure whether kube_node_status_capacity represents the total CPU count for the node, or just the CPU allocated to run Kubernetes pods (a node can reserve some CPU). So using this query might be more accurate, depending on your needs.

Upvotes: 1

Related Questions