Vowneee
Vowneee

Reputation: 1481

Azure DevOps Agent on AKS (D16ds_v5) Shows 100% CPU Usage and Slow Android Builds

I am running an Azure DevOps self-hosted agent inside a container on Azure Kubernetes Service (AKS). The AKS node pool uses Standard_D16ds_v5 (16 vCPUs, 64 GB RAM, Ephemeral SSD).

Issue:

What I Checked:

  1. Kubernetes CPU Limits:
    • Initially set requests.cpu: "4" and limits.cpu: "12".
  2. Ephemeral Disk Usage:
    • Mounted /mnt to container for Gradle cache.
    • Set GRADLE_USER_HOME=/mnt/gradle_cache, but builds remain slow.

Questions:

  1. Why is my container consuming 100% CPU, despite running on a high-performance VM?
  2. Could Kubernetes CPU scheduling (cgroups) be limiting performance?
  3. Is there a way to ensure the ADO agent and Gradle build utilize the ephemeral SSD optimally?
  4. Any best practices for optimizing Android builds in AKS?

Any insights would be greatly appreciated!

Upvotes: 0

Views: 41

Answers (1)

Bright Ran-MSFT
Bright Ran-MSFT

Reputation: 13944

There are many ways can cause High CPU usage in AKS clusters, however, the most causes could be related to user configuration.

You can follow the main steps below to troubleshoot the High CPU usage in AKS clusters:

  1. Use the Container Insights feature of AKS to identify nodes/containers with high CPU usage.

  2. Consider implementing any of the following best practices for avoiding high CPU usage:

    • Set appropriate limits for containers: It is recommended setting appropriate requests and limits to choose the appropriate Kubernetes Quality of Service (QoS) class for each pod.
    • Enable Horizontal Pod Autoscaler (HPA): Setting appropriate limits along with enabling HPA can help in resolving high CPU usage.
    • Select higher SKU: Use higher SKU to handle high CPU workloads.
    • Isolate system and user workloads: It is recommended creating a separate node pool (other than the agentpool) to run your workloads to prevent overloading the system node pool and provide better performance.

For more details, you can refer to the documentation "Troubleshoot high CPU usage in AKS clusters".


Upvotes: 0

Related Questions