Reputation: 593
Question on Memory resource on GKE.
i have a node which has 8G memory and workload with the following resources :
resources:
limits:
memory: 2560Mi
requests:
cpu: 1500m
memory: 2Gi
recently i’ve noticed many cases where i see on the VM log itself (GCE) messages like the following:
[14272.865068] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=3e209c27d4b26f4f63c4f0f1243aeee928f4f2eb4c180e5b986211e3ae1c0b5a,mems_allowed=0,oom_memcg=/kubepods/burstable/podc90baea5-9ea8-49cd-bd38-2adda4250d17,task_memcg=/kubepods/burstable/podc90baea5-9ea8-49cd-bd38-2adda4250d17/3e209c27d4b26f4f63c4f0f1243aeee928f4f2eb4c180e5b986211e3ae1c0b5a,task=chrome,pid=222605,uid=1001\r\n
[14272.899698] Memory cgroup out of memory: Killed process 222605 (chrome) total-vm:7238644kB, anon-rss:2185428kB, file-rss:107056kB, shmem-rss:0kB, UID:1001 pgtables:14604kB oom_score_adj:864\r\n
[14273.125672] oom_reaper: reaped process 222605 (chrome), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB\r\n
[14579.292816] chrome invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=864\r\n
which basically indicate the node got OOM and kill one of the services on the node, which in my case is chrome, which is the service that being run as per the workload. at the exact same time i see an error on the workload (page crash on browser) , but there was no restart for the container.
as i know GKE can evict pods while under memory pressure , i’m trying to figure out the difference between OOM of the service itself, and OOM-kill for the pod.
when looking on the memory usage i see at this timeframe, pod reached top of 2.4G and the Node itself reached 7.6G.
the reason the pod wasnt evictetd with oom-kill error is cause it did not pass the actual limit? wasnt the oom-killer was supposed to restart the container? baed on the logs the specific service on the container just killed and everything 'remains' the same.
any help will be appriciated. thanks CL
Upvotes: 2
Views: 4300
Reputation: 13878
There are few concepts that needs to be explained here. First would be the importance of Requests and Limits. See the example:
when a process in the container tries to consume more than the allowed amount of memory, the system kernel terminates the process that attempted the allocation, with an out of memory (OOM) error.
The behavior you are facing is well described in this article along with this video.
Than there is the idea of Configuring Out of Resource Handling. Especially the Node OOM Behavior:
If the node experiences a system OOM (out of memory) event prior to the
kubelet
being able to reclaim memory, the node depends on the oom_killer to respond.
I highly recommend getting familiar with the linked materials to get a good understanding about the topics you mentioned. Also, there is a good article showing a live example of it: Memory Limit of POD and OOM Killer.
Upvotes: 2