Danny Cadavid
Danny Cadavid

Reputation: 1

Why the CPU usage of a GKE Workload is not equal to the sum of the CPU usage of its pods?

I'm trying to figure out why a GKE "Workload" CPU usage is not equivalent to the sum of cpu usage of its pods.

Following image shows a Workload CPU usage.

Service Workload CPU Usage

Following images show pods CPU usage for the above Workload.

Pod #1 CPU Usage

Pod #2 CPU Usage

For example, at 9:45, the Workload cpu usage was around 3.7 cores, but at the same time Pod#1 CPU usage was around 0.9 cores and Pod#2 CPU usage was around 0.9 cores too. It means, the service Workload CPU Usage should have been around 1.8 cores, but it wasn't.

Does anyone have an idea of this behavior?

Thanks.

Upvotes: 0

Views: 1849

Answers (3)

guillaume blaquiere
guillaume blaquiere

Reputation: 75775

On your VM, the node managed by Kubernetes, you have the deployed pods (that you manage) but also several services that run on it for the supervision, the management, the logs ingestion,... A basic description here

You can see all these basic services by performing this command kubectl get all --namespace kube-system.

If you have installed additional components, like Istio or Knative, you have additional services and namespaces. All of these get a part of the resources of the node.

Upvotes: 1

Gabo Licea
Gabo Licea

Reputation: 198

Danny,

The CPU chart on the Workloads page is an aggregate of CPU usage for managed pods. The values are taken from the Stackdriver Monitoring metric container/cpu/usage_time, check this link. That metric represents "Cumulative CPU usage on all cores in seconds. This number divided by the elapsed time represents usage as a number of cores, regardless of any core limit that might be set."

Please let me know if you have further questions in regard to this.

Upvotes: 1

Patrick W
Patrick W

Reputation: 4899

I suspect this is a bug in the UI. There is no actual metric for deployment CPU usage. Stackdriver Monitoring only collects data on container, pod, and node level metrics thus the only really reliable metrics in this case are the ones for pod CPU usage.

The graph for the total deployment CPU usage is likely meant to be a sum of all the pods metrics calculated and then presented to you. It is not as reliable as the pod or container metrics since it is not a direct metric.

If you are seeing this discrepancy consistently, I recommend opening a UI bug report through the Google Public Issue Tracker to report this to the GCP Engineers.

Upvotes: 0

Related Questions