Reputation: 21
I’m trying to use the prometheus measurements to get percent CPU usage for each micro service running in Kubernetes to optimize CPU resources and limits.
I have a setup where for each customer there are 4 micro services running on the server. Each micro service has a separate memory resource and limit and separate CPU resource and limit. To get the average from prometheus I am using the following query:
avg_over_time(sum(rate(container_cpu_usage_seconds_total{name=~"^k8s_.", namespace=~"$namespace", container_name!="POD", pod=~"^$Deployment.$"}[5m]))[24h:5m]) / avg_over_time(sum(container_spec_cpu_quota{name=~"^k8s_.", namespace=~"$namespace",container_name!="POD", pod=~"^$Deployment.$"}/container_spec_cpu_period{name=~"^k8s_.",namespace=~"$namespace", container_name!="POD", pod=~"^$Deployment.$"})[24h:5m]) * 100
To check that the value above is correct, I go into each Kubernetes pod and check the CPU usage using the command: kubectl -n {namespace} top pod {Deployment}
To check the CPU limit I use the command: kubectl -n {namespace} describe pod {Deployment}
Where I get the CPU limit.
Then I do the calculation: CPU usage divided by CPU limit times 100 equals current percent of CPU usage.
The values I get from the CPU usage and limit in Kubernetes are different from the values I get using the prometheus query (Some of the values I get are close and some are quite off). Here is an example of CPU usage in Percent from Prometheus and from Kubernetes:
Customer | Service | Prometheus | Kubernetes |
---|---|---|---|
Customer A | Service 1 | 0.216 | 0.2 |
Service 2 | 0.137 | 0.2 | |
Service 3 | 0.445 | 0.45 | |
Service 4 | 0.165 | 0.2 | |
Customer B | Service 1 | 0.139 | 0.2 |
Service 2 | 0.0917 | 0.2 | |
Service 3 | 0.5739 | 0.5 | |
Service 4 | 0.0972 | 0.2 |
Anyone have any comments whether I am doing the measurements correctly? Is there a mistake in my prometheus query or how I get the values from Kubernetes? I want to make sure that I am measuring the percent CPU usage correctly using prometheus
Upvotes: 2
Views: 1612
Reputation: 291
Can you try the following query for one service and modify the query according your requirement:
sum (rate (container_cpu_usage_seconds_total{id="/"}[1m])) / sum (machine_cpu_cores) * 100
I also track the CPU usage for each pod.
sum (rate (container_cpu_usage_seconds_total{image!=""}[1m])) by (pod_name) I have a complete kubernetes-prometheus solution on GitHub, maybe can help you with more metrics: https://github.com/camilb/prometheus-kubernetes.
I hope this will help! The result is pretty much the same as the Windows performance manager. So, for CPU % for running services (tasks, processes):
sum by (process,hostname)(irate(wmi_process_cpu_time_total{scaleset="name", process=~
Upvotes: 1