computer007
computer007

Reputation: 57

CPU usage reported in /proc/stat is inconsistent (wrong number of cpu ticks)

Summary of the issue:

In order to report CPU usage per core, I decided to rely on /proc/stat file populated by the kernel. I am running an application inside a Docker container, mapped to core 1 on my CPU (through the --cpu-set option when calling docker run). I put no limitation on cpu usage on this core (i.e. no --cpus option).

Among the different threads of my application, one in particular is generating the main part of CPU load. Thanks to the core mapping I know that this CPU load is only on core 1, and as htop is reporting 25% CPU usage for this thread, I would expect at least 25% usage on core 1 in the bar graph. However it is not always the case. I say not "always" because with no obvious reason, the percentage sometimes looks consistent, sometimes not.

Based on the documentation I read, the /proc/stat "file" is supposed to report the accumulated CPU ticks per "category" (i.e. user level, kernel level, idle, etc). So, collecting values every second, summing all columns for core 1, and then computing the difference with last result shall give a value very close to the configured kernel jiffies / sec (i.e. 100 on my system). However I rather gets 73, 75. As I collect the values through a bash script I know the "polling" period will not be very precise, but that does not explain such a big difference (75 vs 100, which by the way is very similar to the 25% load that is missing).

htop inconsistent cpu load

In above screenshot you can see the ordering by CPU usage. I configured htop not to hide kernel tasks. You can see that the 25% load is not reflected in the bar graph on top left corner. For info htop was run with -d 10 option to refresh the values every second. I have not provided a video, but the percentages in bar graph always remain under 10% over time.

Below the /proc/stat extract per second + computed diff

enter image description here

EDIT1: here is a second case I am facing (which highlights the "random" behavior as we now over-estimate):

htop wrong bar graph

And the /proc/stat extract per second + computed diff

wrong /proc/stat content


More details:

Linux raspberrypi 6.1.19-v8+ #1637 SMP PREEMPT Tue Mar 14 11:11:47 GMT 2023 aarch64 GNU/Linux

I also observed a similar behavior on a x86 machine, running Ubuntu 22.04, with kernel 5.15.0-91-lowlatency #101-Ubuntu SMP PREEMPT, so it looks like it is not happening on a specific proc architecture / OS / kernel.

pi@raspberrypi:~ $ getconf CLK_TCK
100

On core 1 I observed the following values without being able to say if the occurences for arch_timer are excessive or not per second:

interrupts count

Questions:

Do you know what could explained this incorrect reporting of CPU ticks count in /proc/stat? I heard of NOHZ option in the kernel, does it look relevant for this case?

Upvotes: 0

Views: 434

Answers (0)

Related Questions