user9517026
user9517026

Reputation: 123

Analyze Kubernetes pod OOMKilled

We got OOMKilled event on our K8s pods. We want in case of such event to run Native memory analysis command BEFORE the pod is evicted. Is it possible to add such a hook?

Being more specific: we run with -XX:NativeMemoryTracking=summary JVM flag. We want to run jcmd <pid> VM.native_memory summary.diff just BEFORE pod eviction to see what causes OOM.

Upvotes: 7

Views: 7845

Answers (2)

zangw
zangw

Reputation: 48566

Here are some steps we used to analyze the pod OOMKilled in K8S.

  • Memory usage monitor through Prometheus

    • Not only the metric container_memory_working_set_bytes used to monitor the memory usage but also the container_memory_max_usage_bytes.
    • We could find some abnormal memory increase from the above metrics, to further investigate based on our logic codes.
  • Check the system Log

    • In a systemd based Linux Distribution. The tool to use is journalctl.
      sudo journalctl --utc -ke
    
    • -ke show only kernel messages and jumps to end of the log. Some logs below
      memory: usage 4194304kB, limit 4194304kB, failcnt 1239
      memory+swap: usage 4194304kB, limit 9007199254740988kB, failcnt 0
      kmem: usage 13608kB, limit 9007199254740988kB, failcnt 0
      Memory cgroup stats for /kubepods.slice/kubepods-burstable.slice/kubepods-burst...
    
    • We could find some abnormal memory usage in System View.
  • Check memory cgroup stats from the above folder

    • Check the files under the folder /sys/fs/cgroup/memory/kubepods.slice/kubepods-burstable.slice to find the memory allocation information.
  • Memory dump of pod

    • We could add one hook to preStop of pod to dump the memory of pod, further investigation could be done based on the dump file. Here is one sample of Java
      lifecycle:
        preStop:
          exec:
            command:
            - sh
            - -c
            - "jmap -dump:live,format=b,file=/folder/dump_file 1"
    
  • Continues Profile

    • Use continue profile tools like pyroscope to monitor memory usage of the pod, and find the leak point profile data.

Upvotes: 1

Anton Kostenko
Anton Kostenko

Reputation: 9033

Looks like it is almost impossible to handle.

Based on an answer on Github about a gracefully stop on OMM Kill:

It is not possible to change OOM behavior currently. Kubernetes (or runtime) could provide your container a signal whenever your container is close to its memory limit. This will be on a best effort basis though because memory spikes might not be handled on time.

Here is from official documentation:

If the node experiences a system OOM (out of memory) event prior to the kubelet is able to reclaim memory, the node depends on the oom_killer to respond. The kubelet sets a oom_score_adj value for each container based on the quality of service for the Pod.

So, as you understand, you have not much chance to handle it somehow. Here is the large article about the handling of OOM, I will take just a small part here, about memory controller out of memory handling:

Unfortunately, there may not be much else that this process can do to respond to an OOM situation. If it has locked its text into memory with mlock() or mlockall(), or it is already resident in memory, it is now aware that the memory controller is out of memory. It can't do much of anything else, though, because most operations of interest require the allocation of more memory.

The only thing I can offer is getting a data from cAdvisor (here you can get an OOM Killer event) or from Kubernetes API and run your command when you see by metrics that you are very close to out of memory. I am not sure that you will have a time to do something after you will get OOM Killer event.

Upvotes: 12

Related Questions