Reputation: 3612
Good day , I know that Docker containers are using the host's kernel (which is why containers are considered as lightweight vms) Here the the source . However, after reading Runtime Options part of a docker documentation I met an option called --kernel-memory
. The doc says
The maximum amount of kernel memory the container can use.
I didn't understand what it does. My guess is every container will allocate some memory in host's kernel space .If so then what is the reason , isn't it vulnerable for a user process to allocate memory in kernel space ?
Upvotes: 3
Views: 3116
Reputation: 1940
The whole CPU/Memory Limitation stuff is using cgroups
.
You can find all settings performed by docker run
(either per args or per default) under /sys/fs/cgroup/memory/docker/<container ID>
for memory or /sys/fs/cgroup/cpu/docker/<container ID>
for cpu.
So the --kernel-memory
:
Reading: cat memory.kmem.limit_in_bytes
Writing: sudo -s
echo 2167483648 > memory.kmem.limit_in_bytes
And also the benchmarking memory.kmem.max_usage_in_bytes
and memory.kmem.usage_in_bytes
which shows (rather selfexplaining) the current usage and the highest usage overall.
For the functionality I will recommend reading Kernel Docs for CGroups V1 instead of the docker docs:
2.7 Kernel Memory Extension (CONFIG_MEMCG_KMEM)
With the Kernel memory extension, the Memory Controller is able to limit the amount of kernel memory used by the system. Kernel memory is fundamentally different than user memory, since it can't be swapped out, which makes it possible to DoS the system by consuming too much of this precious resource. [..]
The memory used is accumulated into memory.kmem.usage_in_bytes, or in a separate counter when it makes sense. (currently only for tcp). The main "kmem" counter is fed into the main counter, so kmem charges will also be visible from the user counter.
Currently no soft limit is implemented for kernel memory. It is future work to trigger slab reclaim when those limits are reached.
and
2.7.2 Common use cases
Because the "kmem" counter is fed to the main user counter, kernel memory can never be limited completely independently of user memory. Say "U" is the user limit, and "K" the kernel limit. There are three possible ways limits can be set:
U != 0, K = unlimited: This is the standard memcg limitation mechanism already present before kmem accounting. Kernel memory is completely ignored. U != 0, K < U: Kernel memory is a subset of the user memory. This setup is useful in deployments where the total amount of memory per-cgroup is overcommited. Overcommiting kernel memory limits is definitely not recommended, since the box can still run out of non-reclaimable memory. In this case, the admin could set up K so that the sum of all groups is never greater than the total memory, and freely set U at the cost of his QoS. WARNING: In the current implementation, memory reclaim will NOT be triggered for a cgroup when it hits K while staying below U, which makes this setup impractical. U != 0, K >= U: Since kmem charges will also be fed to the user counter and reclaim will be triggered for the cgroup for both kinds of memory. This setup gives the admin a unified view of memory, and it is also useful for people who just want to track kernel memory usage.
Given a running container with --memory="2g" --memory-swap="2g" --oom-kill-disable
using
cat memory.kmem.max_usage_in_bytes
10747904
10 MB of Kernel-Memory in normal state. Would make sense to me to limit it, let's say to 20 MB of Kernel-Memory. Then it should kill or limit the container to protect the host. But due to the fact that there is - according to the docs - no possibility to reclaim the memory and the OOM Killer is starting to kill processes on host then even with a plenty of free memory (according to this: https://github.com/docker/for-linux/issues/1001) for me it is rather unpractical to use that.
The quoted option to set it >= memory.limit_in_bytes
is not really helpful in that scenario either.
--kernel-memory
is deprecated in v20.10, due to the fact someone (=Linux Kernel) realized all that as well..
ULimit
Docker API exposes HostConfig|Ulimit
which writes to /etc/security/limits.conf
. For docker run should be --ulimit <type>=<soft>:<hard>
. Use cat /etc/security/limits.conf
or man setrlimit
to see the categories and you can try to protect your system from filling kernel memory by e.g. generate unlimited processes with --ulimit nproc=500:500
, but be careful, nproc
works for users and not for containers, so count together..
To prevent DDoS (intentionally or unintentionally) i would suggest to limit at least nofile
and nproc
. Maybe someone can elaborate further..
sysctl:
docker run --sysctl
can change kernel variables on message queue and shared memory, also network, e.g. docker run --sysctl net.ipv4.tcp_max_orphans=
for orphan tcp connections which defaults on my system to 131072, and by a kernel memory usage of 64 kB each: Bang 8 GB on malfunction or dos. Maybe someone can elaborate further..
Upvotes: 3