Reputation: 408
I've a Java application with the configuration -Xmx4096m
. The application itself is deployed in a k8s pod with a memory limit of: 8192Mi. After doing some analysis with the command jcmd 8 VM.native_memory summary
the output (converted to MB) is as follows:
Native Memory Tracking:
Total: reserved=6009MB, committed=4797MB
- Java Heap (reserved=4096MB, committed=4094MB)
(mmap: reserved=4096MB, committed=4094MB)
- Class (reserved=1097MB, committed=82MB)
(classes #14238)
(malloc=3MB #35561)
(mmap: reserved=1094MB, committed=79MB)
- Thread (reserved=254MB, committed=254MB)
(thread #253)
(stack: reserved=253MB, committed=253MB)
(malloc=0.8MB #1506)
(arena=0.3MB #501)
- Code (reserved=254MB, committed=65MB)
(malloc=11MB #17281)
(mmap: reserved=244MB, committed=54MB)
- GC (reserved=226MB, committed=226MB)
(malloc=42MB #192857)
(mmap: reserved=184MB, committed=184MB)
- Compiler (reserved=1.1MB, committed=1.1MB)
(malloc=1MB #2175)
(arena=0.1MB #6)
- Internal (reserved=50MB, committed=50MB)
(malloc=50MB #178156)
(mmap: reserved=0.03MB, committed=0.03MB)
- Symbol (reserved=16MB, committed=16MB)
(malloc=13MB #129700)
(arena=3MB #1)
- Native Memory Tracking (reserved=8.5MB, committed=8.5MB)
(malloc=0.03MB #339)
(tracking overhead=8.5MB)
- Arena Chunk (reserved=0.2MB, committed=0.2MB)
(malloc=0.2MB)
- Unknown (reserved=8MB, committed=0MB)
(mmap: reserved=8MB, committed=0MB)
This never increases but the top command shows 7.9 GB for the java process memory usage. And then the appications gets killed when it reaches the 8GB limit with a:
Last State: Terminated
Reason: OOMKilled
So, my question is where this memory is going? Meaning, which should be the next debugging steps to find the root cause of the issue?
Upvotes: 6
Views: 237
Reputation: 2169
If you get OOMKilled despite NMT showing more available memory, you could:
-XX:MaxDirectMemorySize=512m
-Xss512k
-Xlog:gc*:file=/tmp/gc.log:time,level,tags
The following script can assist in issue analysis:
#!/usr/bin/env bash
# Set Java process ID (auto-detect if only one Java process is running)
PID=$(pgrep -f "java" | head -n 1)
if [[ -z "$PID" ]]; then
echo "No Java process found. Exiting."
exit 1
fi
echo "Analyzing memory usage for Java process ID: $PID"
echo_section() { echo -e "\n\e[1;34m[$1]\e[0m"; }
echo_section "1. Off-Heap Memory (Direct Buffers, JNI, Metaspace)"
jcmd $PID GC.class_stats | grep -i direct
jcmd $PID VM.system_properties | grep java.nio
echo_section "2. Thread Count and Stack Size"
jcmd $PID Thread.print | grep "java.lang.Thread.State" | wc -l
echo_section "3. JVM Pointer & Metadata Overhead"
jcmd $PID VM.info | grep -i UseCompressedOops
echo_section "4. Kubernetes cgroup Memory Usage"
if [[ -f /sys/fs/cgroup/memory/memory.usage_in_bytes ]]; then
cat /sys/fs/cgroup/memory/memory.usage_in_bytes
else
echo "cgroup memory file not found. Are you inside a container?"
fi
echo_section "5 Check Native Memory Leaks (Libraries, Malloc)"
lsof -p $PID | grep deleted
pmap $PID | sort -k2 -nr | head -20
Upvotes: 1
Reputation: 900
This looks terribly similar to issues I encountered with containerized JVM's. The container would OOME regularly after some time under heavy load. This exclusively happened in containers.
I upgraded the JVM to the then latest LTS version and used the -XX:+UseContainerSupport
JVM parameter (introduced in Java 10) to alleviate the problem.
NB: This is of course assuming that on a physical box with 8G RAM the application runs without issues. That the host atop of which the conainer runs does not run out of RAM for some other reason. Troubleshooting ̶O̶O̶M̶E̶`̶s̶ OOMKilled's is complex.
Update:
You can troubleshoot this with pmap
, run it over time and see which elements increase in size. Another thing you could monitor is the output of ls /proc/$pid/fd | wc -l
(where $pid is the java process PID) and check if it increases over time. This will count all open files, sockets as well as a number of other resources used by the java process.
You could create a loop on the shell, to sleep 5 minutes and issue the commands:
while true
do
echo $(date)
kubectl exec <pod> -t -- pmap >> /tmp/pmaps
_pid=$(kubectl exec <pod> -t -- ps -ef | grep -m 1 java | awk {'print $2'})
kubectl exec <pod> -t -- ls /proc/$_pid/fd | wc -l >> /tmp/number_fd
sleep 600 #every 10 minutes
done
Come back a few hours later and you should have an idea of what increases. Ctrl+c to break the loop.
Upvotes: 1