Tony.H
Tony.H

Reputation: 761

PromQL to correctly get CPU usage percentage

I am struggling to understand some concepts regarding the cAdvisor metrics (when scraped by Prometheus) specifically the cpu usage metrics.

It provides the following three metric types concerning CPU usage:

I thought to get the percentage (* 100) of the respective CPU when I take the rate of them. For example with following PromQL:

sum by (pod) (container_cpu_usage_seconds_total)

However, the sum of the cpu_user and cpu_system percentage values do not add up to the percentage value of the cpu_usage. If this is an expected difference what does this difference represent?

Upvotes: 1

Views: 6706

Answers (2)

valyala
valyala

Reputation: 17830

It is incorrect to sum *_total metrics - you should apply rate to them and then sum the result:

sum by (pod) (rate(container_cpu_usage_seconds_total[5m]))

This query should return the number of CPU cores used per each pod during the last 5 minutes.

See this blog post explaining why sum(rate()) should be used instead of rate(sum())

Upvotes: 1

I don't know how exactly cAdvisor works but making a parallel with how Node_Exporter does, I think there are more CPU modes besides "user" and "system" to add up to the total CPU usage.

Look at the all Node_Exporter CPU modes available:

# HELP node_cpu_seconds_total Seconds the cpus spent in each mode.
# TYPE node_cpu_seconds_total counter
node_cpu_seconds_total{cpu="0",mode="idle"} 5.96744154e+06
node_cpu_seconds_total{cpu="0",mode="iowait"} 6523.35
node_cpu_seconds_total{cpu="0",mode="irq"} 0
node_cpu_seconds_total{cpu="0",mode="nice"} 936.5
node_cpu_seconds_total{cpu="0",mode="softirq"} 8087.39
node_cpu_seconds_total{cpu="0",mode="steal"} 21.29
node_cpu_seconds_total{cpu="0",mode="system"} 33360.63
node_cpu_seconds_total{cpu="0",mode="user"} 862602.25

Upvotes: 1

Related Questions