Ivan
Ivan

Reputation: 3836

Troubleshooting unbounded Java Resident Set Size(RSS) growth

I have a standalone Java application which has:

-Xmx1024m -Xms1024m -XX:MaxPermSize=256m -XX:PermSize=256m

Over the course of time it hogs more and more memory, starts to swap(and slow down) and eventually died a number of times(not OOM+dump, just died, nothing on /var/log/messages).

What I've tried so far:

  1. Heap dumps: live objects take 200-300Mb out of 1G heap --> ok with heap
  2. Number of live threads is rather constant(~60-70) --> ok with thread stacks
  3. JMX stops answering at some point(mb it answers but timeout is lower)
  4. Turn off swap - it dies faster
  5. strace - seems everything slows down a bit, app still haven't died, and not sure for which things look there
  6. Checking top: VIRT grows to 5.5Gb, RSS to 3.7 Gb
  7. Checking vmstat(obviously we start to swap):

     --------------------------procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
    Sun Jul 22 16:10:26 2012:  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
    Sun Jul 22 16:48:41 2012:  0  0 138652 2502504  40360 706592    1    0   169    21 1047  206 20  1 74  4  0
    . . . 
    Sun Jul 22 18:10:59 2012:  0  0 138648  24816  58600 1609212    0    0   124   669  913 24436 43 22 34  2  0
    Sun Jul 22 19:10:22 2012: 33  1 138644  33304   4960 1107480    0    0   100   536  810 19536 44 22 23 10  0
    Sun Jul 22 20:10:28 2012: 54  1 213916  26928   2864 578832    3  360   100   710  639 12702 43 16 30 11  0
    Sun Jul 22 21:10:43 2012:  0  0 629256  26116   2992 467808   84  176   278  1320 1293 24243 50 19 29  3  0
    Sun Jul 22 22:10:55 2012:  4  0 772168  29136   1240 165900  203   94   435  1188 1278 21851 48 16 33  2  0
    Sun Jul 22 23:10:57 2012:  0  1 2429536  26280   1880 169816 6875 6471  7081  6878 2146 8447 18 37  1 45  0
    
  8. sar also shows steady system% growth = swapping:

     15:40:02          CPU     %user     %nice   %system   %iowait    %steal     %idle
     17:40:01          all     51.00      0.00      7.81      3.04      0.00     38.15
     19:40:01          all     48.43      0.00     18.89      2.07      0.00     30.60
     20:40:01          all     43.93      0.00     15.84      5.54      0.00     34.70
     21:40:01          all     46.14      0.00     15.44      6.57      0.00     31.85
     22:40:01          all     44.25      0.00     20.94      5.43      0.00     29.39
     23:40:01          all     18.24      0.00     52.13     21.17      0.00      8.46
     12:40:02          all     22.03      0.00     41.70     15.46      0.00     20.81
    
  9. Checking pmap gaves the following largest contributors:

      000000005416c000 1505760K rwx--    [ anon ]
      00000000b0000000 1310720K rwx--    [ anon ]
      00002aaab9001000 2079748K rwx--    [ anon ]
    
  10. Trying to correlate addresses I've got from pmap from stuff dumped by strace gave me no matches

  11. Adding more memory is not practical(just make problem appear later)

  12. Switching JVM's is not possible(env is not under our control)

And the question is: What else can I try to track down the problem's cause or try to work around it?

Upvotes: 2

Views: 2397

Answers (3)

Lari Hotari
Lari Hotari

Reputation: 5310

There is a known problem with Java and glibc >= 2.10 (includes Ubuntu >= 10.04, RHEL >= 6).

The cure is to set this env. variable: export MALLOC_ARENA_MAX=4

There is an IBM article about setting MALLOC_ARENA_MAX https://www.ibm.com/developerworks/community/blogs/kevgrig/entry/linux_glibc_2_10_rhel_6_malloc_may_show_excessive_virtual_memory_usage?lang=en

This blog post says

resident memory has been known to creep in a manner similar to a memory leak or memory fragmentation.

search for MALLOC_ARENA_MAX on Google or SO for more references.

You might want to tune also other malloc options to optimize for low fragmentation of allocated memory:

# tune glibc memory allocation, optimize for low fragmentation
# limit the number of arenas
export MALLOC_ARENA_MAX=2
# disable dynamic mmap threshold, see M_MMAP_THRESHOLD in "man mallopt"
export MALLOC_MMAP_THRESHOLD_=131072
export MALLOC_TRIM_THRESHOLD_=131072
export MALLOC_TOP_PAD_=131072
export MALLOC_MMAP_MAX_=65536

Upvotes: 1

Ivan
Ivan

Reputation: 3836

Problem was in a profiler library attached - it recorded CPU calls/allocation sites, thus required memory to store that.

So, human factor here :)

Upvotes: 1

Stephen C
Stephen C

Reputation: 719446

Something in your JVM is using an "unbounded" amount of non-Heap memory. Some possible candidates are:

  • Thread stacks.
  • Native heap allocated by some native code library.
  • Memory-mapped files.

The first possibility will show up as a large (and increasing) number of threads when you take a thread stack dump. (Just check it ... OK?)

The second one you can (probably) eliminate if your application (or some 3rd part library it uses) doesn't use any native libraries.

The third one you can eliminate if your application (or some 3rd part library it uses) doesn't use memory mapped files.


I would guess that the reason that you are not seeing OOME's is that your JVM is being killed by the Linux OOM killer. It is also possible that the JVM is bailing out in native code (e.g. due to a malloc failure not being handled properly), but I'd have thought that a JVM crash dump would be the more likely outcome ...

Upvotes: 1

Related Questions