sunsin1985
sunsin1985

Reputation: 2607

Difference between Resident Set Size (RSS) and Java total committed memory (NMT) for a JVM running in Docker container

Scenario:

I have a JVM running in a docker container. I did some memory analysis using two tools: 1) top 2) Java Native Memory Tracking. The numbers look confusing and I am trying to find whats causing the differences.

Question:

The RSS is reported as 1272MB for the Java process and the Total Java Memory is reported as 790.55 MB. How can I explain where did the rest of the memory 1272 - 790.55 = 481.44 MB go?

Why I want to keep this issue open even after looking at this question on SO:

I did see the answer and the explanation makes sense. However, after getting output from Java NMT and pmap -x , I am still not able to concretely map which java memory addresses are actually resident and physically mapped. I need some concrete explanation (with detailed steps) to find whats causing this difference between RSS and Java Total committed memory.

Top Output

enter image description here

Java NMT

enter image description here

Docker memory stats

enter image description here

Graphs

I have a docker container running for most than 48 hours. Now, when I see a graph which contains:

  1. Total memory given to the docker container = 2 GB
  2. Java Max Heap = 1 GB
  3. Total committed (JVM) = always less than 800 MB
  4. Heap Used (JVM) = always less than 200 MB
  5. Non Heap Used (JVM) = always less than 100 MB.
  6. RSS = around 1.1 GB.

So, whats eating the memory between 1.1 GB (RSS) and 800 MB (Java Total committed memory)?

enter image description here

Upvotes: 71

Views: 46435

Answers (2)

user2610067
user2610067

Reputation: 61

Disclaimer: I am not an expert

I had a production incident recently when under heavy load, pods had a big jump in RSS and Kubernetes killed the pods. There was no OOM error exception, but Linux stopped the process in the most hardcore way.

There was a big gap between RSS and total reserved space by JVM. Heap memory, native memory, threads, everything looked ok, however RSS was big.

It was found out that it is due to the fact how malloc works internally. There are big gaps in the memory where malloc takes chunks of memory from. If there are a lot of cores on your machine, malloc tries to adapt and give every core each own space to take free memory from to avoid resource contention. Setting up export MALLOC_ARENA_MAX=2 solved the issue. You can find more about this situation here:

  1. Growing resident memory usage (RSS) of Java Process
  2. https://devcenter.heroku.com/articles/tuning-glibc-memory-behavior
  3. https://www.gnu.org/software/libc/manual/html_node/Malloc-Tunable-Parameters.html
  4. https://github.com/jeffgriffith/native-jvm-leaks

P.S. I don't know why there was a jump in RSS memory. Pods are built on Spring Boot + Kafka.

Upvotes: 6

VonC
VonC

Reputation: 1324268

You have some clue in " Analyzing java memory usage in a Docker container" from Mikhail Krestjaninoff:

(And to be clear, in May 2019, three years later, the situation does improves with openJDK 8u212 )

Resident Set Size is the amount of physical memory currently allocated and used by a process (without swapped out pages). It includes the code, data and shared libraries (which are counted in every process which uses them)

Why does docker stats info differ from the ps data?

Answer for the first question is very simple - Docker has a bug (or a feature - depends on your mood): it includes file caches into the total memory usage info. So, we can just avoid this metric and use ps info about RSS.

Well, ok - but why is RSS higher than Xmx?

Theoretically, in case of a java application

RSS = Heap size + MetaSpace + OffHeap size

where OffHeap consists of thread stacks, direct buffers, mapped files (libraries and jars) and JVM code itse

Since JDK 1.8.40 we have Native Memory Tracker!

As you can see, I’ve already added -XX:NativeMemoryTracking=summary property to the JVM, so we can just invoke it from the command line:

docker exec my-app jcmd 1 VM.native_memory summary

(This is what the OP did)

Don’t worry about the “Unknown” section - seems that NMT is an immature tool and can’t deal with CMS GC (this section disappears when you use an another GC).

Keep in mind, that NMT displays “committed” memory, not "resident" (which you get through the ps command). In other words, a memory page can be committed without considering as a resident (until it directly accessed).

That means that NMT results for non-heap areas (heap is always preinitialized) might be bigger than RSS values.

(that is where "Why does a JVM report more committed memory than the linux process resident set size?" comes in)

As a result, despite the fact that we set the jvm heap limit to 256m, our application consumes 367M. The “other” 164M are mostly used for storing class metadata, compiled code, threads and GC data.

First three points are often constants for an application, so the only thing which increases with the heap size is GC data.
This dependency is linear, but the “k” coefficient (y = kx + b) is much less then 1.


More generally, this seems to be followed by issue 15020 which reports a similar issue since docker 1.7

I'm running a simple Scala (JVM) application which loads a lot of data into and out of memory.
I set the JVM to 8G heap (-Xmx8G). I have a machine with 132G memory, and it can't handle more than 7-8 containers because they grow well past the 8G limit I imposed on the JVM.

(docker stat was reported as misleading before, as it apparently includes file caches into the total memory usage info)

docker stat shows that each container itself is using much more memory than the JVM is supposed to be using. For instance:

CONTAINER CPU % MEM USAGE/LIMIT MEM % NET I/O
dave-1 3.55% 10.61 GB/135.3 GB 7.85% 7.132 MB/959.9 MB
perf-1 3.63% 16.51 GB/135.3 GB 12.21% 30.71 MB/5.115 GB

It almost seems that the JVM is asking the OS for memory, which is allocated within the container, and the JVM is freeing memory as its GC runs, but the container doesn't release the memory back to the main OS. So... memory leak.

Upvotes: 67

Related Questions