gpl
gpl

Reputation: 451

what's meaning the container_cpu_cfs_throttled_seconds_total metrics

cadvisor has two metrics container_cpu_cfs_throttled_seconds_total and container_cpu_cfs_throttled_periods_total

I have confuse what does that means ..

I have found about two explain:

  1. container run with cpu limit, when container cpu over limit , the container will be "throttled" and add time to container_cpu_cfs_throttled_seconds_total

    that means :
     (1). only container cpu over limit, rate(container_cpu_cfs_throttled_seconds_total) > 0. 
     (2). we can use this metrics to alert container cpu over limit ... 
    
  2. when host in heavy cpu pressure, it will "throttled" container with POD QoS(Guaranteed > Burstable > Best-Effort) ...

    that means :
     (1). container_cpu_cfs_throttled_seconds_total will add has no relate with how many cpu container used and cpu limit ..
     (2). this metrics can not to alert container cpu over limit .. 
    

Upvotes: 45

Views: 39236

Answers (2)

DaveFar
DaveFar

Reputation: 7447

container_cpu_cfs_throttled_seconds_total is the sum of all throttle durations, i.e. durations that the container was throttled, i.e. stopped using the uses CFS Cgroup bandwidth control.

Since each stopped thread adds its throttled durations to container_cpu_cfs_throttled_seconds_total, this number can become huge and does not help you (unless you have a known, fixed number of threads or want an estimate on how much CPU shares the container needs in order to avoid being throttled).

That is why alerting on CPU throttling is usually based on the metrics throttled percentage := container_cpu_cfs_throttled_periods_total / container_cpu_cfs_periods_total, i.e. the percentage of CPU periods where the container ran but was throttled (stopped from running the whole CPU period).

For more detail, you can watch this talk on CFS and CPU scheduling, or read the corresponding article.

Upvotes: 51

ffran09
ffran09

Reputation: 1035

Lets say httpbin container running on machine1. Lets say httbin has a limit set in it's deployment to use maximum of 1 CPU. And machine1 has 2 CPUs. It makes httpbin to use half the available.

If httpbin container is trying to use more than 1 CPU, kubernetes will not kill the container. It will throttle it. If it is happening frequently, you may want to get alerted on that and fix the deployment. Another scenario is, if there are multiple containers in machine1 and if there is a lack of CPU resource, then it will throttle all containers it has.

container_cpu_cfs_throttled_seconds_total is the Total time duration the container has been throttled in seconds. container_cpu_cfs_throttled_periods_total is the Number of throttled period intervals

Upvotes: 9

Related Questions