Reputation: 1737
I am running postgres timescaledb on my docker swarm node. I set limit of CPU to 4 and Mem limit to 32G. When I check docker stats
, I can see this output:
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
c6ce29c7d1a4 pg-timescale.1.6c1hql1xui8ikrrcsuwbsoow5 341.33% 20.45GiB / 32GiB 63.92% 6.84GB / 5.7GB 582GB / 172GB 133
CPU% is oscilating around 400%. Node has 6CPUs and average load has been 1 - 2 (1minute load average), So according to me, with my limit of CPUs - 4, the maximum load should be oscilating around 6. My current load is 20 (1minute load average), and output of top
command from inside of postgres show 50-60%.
My service configuration limit:
deploy:
resources:
limits:
cpus: '4'
memory: 32G
I am confused, all values are different so what is real CPU usage of postgres and how to limit it ? My server load is pushed to maximum even limit of postgres is set to 4. Inside postgres I can see from htop that there is 6 cores and 64G MEM so its looks like it has all resources of the hosts. From docker stats maximum cpu is 400% - corelate with limit of 4 cpus.
Upvotes: 1
Views: 4799
Reputation: 265110
Load average from commands like top
in Linux refer to the number of processes running or waiting to run on average over some time period. CPU limits used by docker specify the number of CPU cycles over some timeframe permitted for processes inside of a cgroup. These aren't really measuring the same thing, especially when you factor in things like I/O waiting. You can have a process waiting for a read from disk, that wants to run but is blocked on that I/O call, increasing your load measurements on the host, but not using any CPU cycles.
When calculating how much CPU to allocate to a cgroup, no only do you need to factor in the I/O and other system needs of the process, but consider queuing theory when you approach saturation on the CPU. The closer you get to 100% utilization of the CPU, the longer the queue of processes ready to run will likely be, resulting in significant jumps in load measurements.
Setting these limits correctly will likely require trial and error because not all processes are the same, and not all workload on the host is the same. A batch processing job that kicks off at irregular intervals and saturates the drives and network will have a very different impact on the host from a scientific computation that is heavily CPU and memory bound.
Upvotes: 1