Reputation: 415
I run a v1.9.2
custom setup of Kubernetes and scrape various metrics with Prometheus v2.1.0
. Among others, I scrape the kubelet
and cAdvisor
metrics.
I want to answer the question: "How much of the CPU resources defined by requests
and limits
in my deployment are actually used by a pod (and its containers) in terms of (milli)cores?"
There are a lot of scraped metrics available, but nothing like that. Maybe it could be calculated by the CPU usage time in seconds, but I don't know how.
I was considering it's not possible - until a friend told me she runs Heapster in her cluster which has a graph in the built-in Grafana that tells exactly that: It shows the indivual CPU usage of a pod and its containers in (milli)cores.
Since Heapster also uses kubelet
and cAdvisor
metrics, I wonder: how can I calculate the same? The metric in InfluxDB is named cpu/usage_rate
but even with Heapster's code, I couldn't figure out how they calculate it.
Any help is appreciated, thanks!
Upvotes: 23
Views: 21594
Reputation: 17784
The following PromQL query returns per-pod number of used CPU cores starting from Kubernetes v1.16 and newer versions:
sum(rate(container_cpu_usage_seconds_total{container!=""}[5m])) by (pod)
The {container!=""}
filter is needed for filtering out cgroups hierarchical stats, which is already included into per-container stats. See this answer for more details on this.
The following PromQL query must be used for Kubernetes below v1.16 because it uses different label names (e.g. container_name
instead of container
and pod_name
instead of pod
- see this issue for details):
sum(rate(container_cpu_usage_seconds_total{container_name!=""}[5m])) by (pod_name)
Upvotes: 5
Reputation: 37934
We're using the container_cpu_usage_seconds_total
metric to calculate Pod CPU usage. This metrics contains the total amount of CPU seconds consumed by container by core (this is important, as a Pod may consist of multiple containers, each of which can be scheduled across multiple cores; however, the metric has a pod_name
annotation that we can use for aggregation). Of special interest is the change rate of that metric (which can be calculated with PromQL's rate()
function). If it increases by 1 within one second, the Pod consumes 1 CPU core (or 1000 milli-cores) in that second.
The following PromQL query does just that: Compute the CPU usage of all Pods (using the sum(...) by (pod_name)
operation) over a five minute average:
sum(rate(container_cpu_usage_seconds_total[5m])) by (pod_name)
Upvotes: 26