dorinand
dorinand

Reputation: 1737

What is real CPU usage of container from docker stats command

I am running postgres timescaledb on my docker swarm node. I set limit of CPU to 4 and Mem limit to 32G. When I check docker stats, I can see this output:

CONTAINER ID   NAME                                                                                     CPU %     MEM USAGE / LIMIT     MEM %     NET I/O           BLOCK I/O         PIDS
c6ce29c7d1a4   pg-timescale.1.6c1hql1xui8ikrrcsuwbsoow5                                                 341.33%   20.45GiB / 32GiB      63.92%    6.84GB / 5.7GB    582GB / 172GB     133

CPU% is oscilating around 400%. Node has 6CPUs and average load has been 1 - 2 (1minute load average), So according to me, with my limit of CPUs - 4, the maximum load should be oscilating around 6. My current load is 20 (1minute load average), and output of top command from inside of postgres show 50-60%.

My service configuration limit:

deploy:
  resources:
    limits:
      cpus: '4'
      memory: 32G

I am confused, all values are different so what is real CPU usage of postgres and how to limit it ? My server load is pushed to maximum even limit of postgres is set to 4. Inside postgres I can see from htop that there is 6 cores and 64G MEM so its looks like it has all resources of the hosts. From docker stats maximum cpu is 400% - corelate with limit of 4 cpus.

Upvotes: 1

Views: 4799

Answers (1)

BMitch
BMitch

Reputation: 265110

Load average from commands like top in Linux refer to the number of processes running or waiting to run on average over some time period. CPU limits used by docker specify the number of CPU cycles over some timeframe permitted for processes inside of a cgroup. These aren't really measuring the same thing, especially when you factor in things like I/O waiting. You can have a process waiting for a read from disk, that wants to run but is blocked on that I/O call, increasing your load measurements on the host, but not using any CPU cycles.

When calculating how much CPU to allocate to a cgroup, no only do you need to factor in the I/O and other system needs of the process, but consider queuing theory when you approach saturation on the CPU. The closer you get to 100% utilization of the CPU, the longer the queue of processes ready to run will likely be, resulting in significant jumps in load measurements.

Setting these limits correctly will likely require trial and error because not all processes are the same, and not all workload on the host is the same. A batch processing job that kicks off at irregular intervals and saturates the drives and network will have a very different impact on the host from a scientific computation that is heavily CPU and memory bound.

Upvotes: 1

Related Questions