Reputation: 761
I am struggling to understand some concepts regarding the cAdvisor metrics (when scraped by Prometheus) specifically the cpu usage metrics.
It provides the following three metric types concerning CPU usage:
I thought to get the percentage (* 100) of the respective CPU when I take the rate of them. For example with following PromQL:
sum by (pod) (container_cpu_usage_seconds_total)
However, the sum of the cpu_user and cpu_system percentage values do not add up to the percentage value of the cpu_usage. If this is an expected difference what does this difference represent?
Upvotes: 1
Views: 6706
Reputation: 17830
It is incorrect to sum *_total
metrics - you should apply rate
to them and then sum the result:
sum by (pod) (rate(container_cpu_usage_seconds_total[5m]))
This query should return the number of CPU cores used per each pod during the last 5 minutes.
See this blog post explaining why sum(rate())
should be used instead of rate(sum())
Upvotes: 1
Reputation: 22321
I don't know how exactly cAdvisor works but making a parallel with how Node_Exporter does, I think there are more CPU modes besides "user" and "system" to add up to the total CPU usage.
Look at the all Node_Exporter CPU modes available:
# HELP node_cpu_seconds_total Seconds the cpus spent in each mode.
# TYPE node_cpu_seconds_total counter
node_cpu_seconds_total{cpu="0",mode="idle"} 5.96744154e+06
node_cpu_seconds_total{cpu="0",mode="iowait"} 6523.35
node_cpu_seconds_total{cpu="0",mode="irq"} 0
node_cpu_seconds_total{cpu="0",mode="nice"} 936.5
node_cpu_seconds_total{cpu="0",mode="softirq"} 8087.39
node_cpu_seconds_total{cpu="0",mode="steal"} 21.29
node_cpu_seconds_total{cpu="0",mode="system"} 33360.63
node_cpu_seconds_total{cpu="0",mode="user"} 862602.25
Upvotes: 1