Joan
Joan

Reputation: 408

Java application memory consumption

I've a Java application with the configuration -Xmx4096m. The application itself is deployed in a k8s pod with a memory limit of: 8192Mi. After doing some analysis with the command jcmd 8 VM.native_memory summary the output (converted to MB) is as follows:

Native Memory Tracking:

Total: reserved=6009MB, committed=4797MB
-                 Java Heap (reserved=4096MB, committed=4094MB)
                            (mmap: reserved=4096MB, committed=4094MB)

-                     Class (reserved=1097MB, committed=82MB)
                            (classes #14238)
                            (malloc=3MB #35561)
                            (mmap: reserved=1094MB, committed=79MB)

-                    Thread (reserved=254MB, committed=254MB)
                            (thread #253)
                            (stack: reserved=253MB, committed=253MB)
                            (malloc=0.8MB #1506)
                            (arena=0.3MB #501)

-                      Code (reserved=254MB, committed=65MB)
                            (malloc=11MB #17281)
                            (mmap: reserved=244MB, committed=54MB)

-                        GC (reserved=226MB, committed=226MB)
                            (malloc=42MB #192857)
                            (mmap: reserved=184MB, committed=184MB)

-                  Compiler (reserved=1.1MB, committed=1.1MB)
                            (malloc=1MB #2175)
                            (arena=0.1MB #6)

-                  Internal (reserved=50MB, committed=50MB)
                            (malloc=50MB #178156)
                            (mmap: reserved=0.03MB, committed=0.03MB)

-                    Symbol (reserved=16MB, committed=16MB)
                            (malloc=13MB #129700)
                            (arena=3MB #1)

-    Native Memory Tracking (reserved=8.5MB, committed=8.5MB)
                            (malloc=0.03MB #339)
                            (tracking overhead=8.5MB)

-               Arena Chunk (reserved=0.2MB, committed=0.2MB)
                            (malloc=0.2MB)

-                   Unknown (reserved=8MB, committed=0MB)
                            (mmap: reserved=8MB, committed=0MB)

This never increases but the top command shows 7.9 GB for the java process memory usage. And then the appications gets killed when it reaches the 8GB limit with a:

Last State:     Terminated
Reason:       OOMKilled

So, my question is where this memory is going? Meaning, which should be the next debugging steps to find the root cause of the issue?

Upvotes: 6

Views: 237

Answers (2)

Ian Carter
Ian Carter

Reputation: 2169

If you get OOMKilled despite NMT showing more available memory, you could:

  • Limit Direct Memory: -XX:MaxDirectMemorySize=512m
  • Reduce Stack Size: -Xss512k
  • Enable GC Logging: -Xlog:gc*:file=/tmp/gc.log:time,level,tags

The following script can assist in issue analysis:

#!/usr/bin/env bash

# Set Java process ID (auto-detect if only one Java process is running)
PID=$(pgrep -f "java" | head -n 1)
if [[ -z "$PID" ]]; then
    echo "No Java process found. Exiting."
    exit 1
fi

echo "Analyzing memory usage for Java process ID: $PID"

echo_section() { echo -e "\n\e[1;34m[$1]\e[0m"; }

echo_section "1. Off-Heap Memory (Direct Buffers, JNI, Metaspace)"
jcmd $PID GC.class_stats | grep -i direct
jcmd $PID VM.system_properties | grep java.nio

echo_section "2. Thread Count and Stack Size"
jcmd $PID Thread.print | grep "java.lang.Thread.State" | wc -l

echo_section "3. JVM Pointer & Metadata Overhead"
jcmd $PID VM.info | grep -i UseCompressedOops

echo_section "4. Kubernetes cgroup Memory Usage"
if [[ -f /sys/fs/cgroup/memory/memory.usage_in_bytes ]]; then
    cat /sys/fs/cgroup/memory/memory.usage_in_bytes
else
    echo "cgroup memory file not found. Are you inside a container?"
fi

echo_section "5 Check Native Memory Leaks (Libraries, Malloc)"
lsof -p $PID | grep deleted
pmap $PID | sort -k2 -nr | head -20

Upvotes: 1

thecarpy
thecarpy

Reputation: 900

This looks terribly similar to issues I encountered with containerized JVM's. The container would OOME regularly after some time under heavy load. This exclusively happened in containers.

I upgraded the JVM to the then latest LTS version and used the -XX:+UseContainerSupport JVM parameter (introduced in Java 10) to alleviate the problem.

NB: This is of course assuming that on a physical box with 8G RAM the application runs without issues. That the host atop of which the conainer runs does not run out of RAM for some other reason. Troubleshooting ̶O̶O̶M̶E̶`̶s̶ OOMKilled's is complex.

Update:

You can troubleshoot this with pmap, run it over time and see which elements increase in size. Another thing you could monitor is the output of ls /proc/$pid/fd | wc -l (where $pid is the java process PID) and check if it increases over time. This will count all open files, sockets as well as a number of other resources used by the java process.

You could create a loop on the shell, to sleep 5 minutes and issue the commands:

while true
do
echo $(date)
kubectl exec <pod> -t -- pmap >> /tmp/pmaps
_pid=$(kubectl exec <pod> -t -- ps -ef | grep -m 1 java | awk {'print $2'})
kubectl exec <pod> -t -- ls /proc/$_pid/fd | wc -l >> /tmp/number_fd
sleep 600 #every 10 minutes
done

Come back a few hours later and you should have an idea of what increases. Ctrl+c to break the loop.

Upvotes: 1

Related Questions