Reputation: 647
In the last few weeks I have been testing different JVM setting for my Glassfish server. The main settings for heap (among others) are: -Xms512m, -Xmx512m, -XX:NewRatio=2. I tried different GC setting but I still have problems with long pauses after few days from starting server. I noticed following:
1. -XX:+UseParallelGC -XX:+UseParallelOldGC - minor GC were occuring about every minute, major GC every 18 hours. I had no problems with minor GC but there were problems with major GC after 5 days. At first major GC pauses lasted about 100-200ms but at the end the last pause lasted 70 seconds.
2. -XX:+UseConcMarkSweepGC -XX:+UseParNewGC - almost same as the above. Minor GC were fine, but major GC (not full) pauses were getting very long. I noticed problems with high class unloading at GC (CMS Final Remark) phase which is stop the world phase.
3. -XX:+UseConcMarkSweepGC -XX:+UseParNewGC and -XX:MaxGCPauseMillis=5000. I was testing this for only a day because the second major GC last (not full) already lasted for about 20 seconds so I thought there is something else wrong.
4. -XX:+UseG1GC, -XX:MaxGCPauseMillis=5000, -XX:+UseStringDeduplication without -XX:NewRatio=2 option - the major GC (not full) is occurring every 12 hours, I have noticed some problems already:
2015-05-31T18:25:25.145+0200: 83383.897: [GC concurrent-mark-start]
2015-05-31T18:25:35.563+0200: 83394.312: [GC concurrent-mark-end, 10.4145795 secs]
2015-05-31T18:25:35.563+0200: 83394.312: [GC remark 83394.312: [Finalize Marking, 0.0002939 secs] 83394.312: [GC ref-proc, 1.2128584 secs] 83395.525: [Unloading, 14.5180500 secs], 15.7320854 secs]
[Times: user=0.14 sys=0.22, real=15.73 secs]
GC remark phase took 15 seconds which is not acceptable for me. You can see that unloading took most of the time. That also happened before when using other GC so I think there must be problem with class unloading.
Summary: all GC run OK for some time but after few days problems started to appear with long pause times. I don't know why it is working fine first few days and then suddenly the results are very bad. I noticed that higher pause time was caused by class unloading so I am wondering if there are some setting to get better results. Also I want to know which GC are you recommending to me to use? I have internal web application running on glassfish server on PC with 8gb of RAM, i7 processor and with Windows 8 operating system. There will be max 10 clients connection at same time but it must have long uptime and there can not be long pause times (max 5 seconds). Please advise me what else I can do to get shorter pause times.
And one more question: what could be disadvantage using G1GC instead of CMS or ParallelGC in my case? Is the heap to small to use G1GC?
EDIT: G1GC log right before and after long pause GC remark phase
2015-05-31T18:25:25.004+0200: 83383.755: [GC pause (G1 Evacuation Pause) (young) (initial-mark), 0.1280453 secs]
[Parallel Time: 116.2 ms, GC Workers: 4]
[GC Worker Start (ms): Min: 83383757.6, Avg: 83383757.7, Max: 83383757.7, Diff: 0.0]
[Ext Root Scanning (ms): Min: 97.8, Avg: 98.3, Max: 98.5, Diff: 0.7, Sum: 393.1]
[Update RS (ms): Min: 0.2, Avg: 4.0, Max: 14.8, Diff: 14.6, Sum: 16.1]
[Processed Buffers: Min: 1, Avg: 6.0, Max: 16, Diff: 15, Sum: 24]
[Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.1]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Object Copy (ms): Min: 0.2, Avg: 2.5, Max: 3.7, Diff: 3.5, Sum: 10.2]
[Termination (ms): Min: 0.0, Avg: 8.5, Max: 11.4, Diff: 11.4, Sum: 34.2]
[GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[GC Worker Total (ms): Min: 113.4, Avg: 113.4, Max: 113.5, Diff: 0.0, Sum: 453.8]
[GC Worker End (ms): Min: 83383871.1, Avg: 83383871.1, Max: 83383871.1, Diff: 0.0]
[Code Root Fixup: 0.1 ms]
[Code Root Purge: 0.0 ms]
[String Dedup Fixup: 2.2 ms, GC Workers: 4]
[Queue Fixup (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Table Fixup (ms): Min: 2.0, Avg: 2.1, Max: 2.1, Diff: 0.1, Sum: 8.3]
[Clear CT: 0.1 ms]
[Other: 9.5 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 8.8 ms]
[Ref Enq: 0.1 ms]
[Redirty Cards: 0.3 ms]
[Humongous Reclaim: 0.0 ms]
[Free CSet: 0.1 ms]
[Eden: 215.0M(215.0M)->0.0B(215.0M) Survivors: 7168.0K->7168.0K Heap: 451.5M(512.0M)->236.6M(512.0M)]
[Times: user=0.08 sys=0.02, real=0.13 secs]
2015-05-31T18:25:25.129+0200: 83383.883: [GC concurrent-root-region-scan-start]
2015-05-31T18:25:25.129+0200: 83383.883: [GC concurrent-string-deduplication, 160.0B->0.0B(160.0B), avg 48.3%, 0.0000070 secs]
[Last Exec: 0.0000070 secs, Idle: 23.1834927 secs, Blocked: 0/0.0000000 secs]
[Inspected: 3]
[Skipped: 0( 0.0%)]
[Hashed: 3(100.0%)]
[Known: 0( 0.0%)]
[New: 3(100.0%) 160.0B]
[Deduplicated: 3(100.0%) 160.0B(100.0%)]
[Young: 3(100.0%) 160.0B(100.0%)]
[Old: 0( 0.0%) 0.0B( 0.0%)]
[Total Exec: 2868/0.1946124 secs, Idle: 2868/83382.9701762 secs, Blocked: 13/0.0032760 secs]
[Inspected: 304493]
[Skipped: 0( 0.0%)]
[Hashed: 163708( 53.8%)]
[Known: 44808( 14.7%)]
[New: 259685( 85.3%) 21.9M]
[Deduplicated: 160467( 61.8%) 10.6M( 48.3%)]
[Young: 83546( 52.1%) 6270.6K( 57.8%)]
[Old: 76921( 47.9%) 4571.3K( 42.2%)]
[Table]
[Memory Usage: 4291.8K]
[Size: 131072, Min: 1024, Max: 16777216]
[Entries: 133319, Load: 101.7%, Cached: 6107, Added: 142389, Removed: 9070]
[Resize Count: 7, Shrink Threshold: 87381(66.7%), Grow Threshold: 262144(200.0%)]
[Rehash Count: 0, Rehash Threshold: 120, Hash Seed: 0x0]
[Age Threshold: 3]
[Queue]
[Dropped: 0]
2015-05-31T18:25:25.145+0200: 83383.897: [GC concurrent-root-region-scan-end, 0.0140467 secs]
2015-05-31T18:25:25.145+0200: 83383.897: [GC concurrent-mark-start]
2015-05-31T18:25:35.563+0200: 83394.312: [GC concurrent-mark-end, 10.4145795 secs]
2015-05-31T18:25:35.563+0200: 83394.312: [GC remark 83394.312: [Finalize Marking, 0.0002939 secs] 83394.312: [GC ref-proc, 1.2128584 secs] 83395.525: [Unloading, 14.5180500 secs], 15.7320854 secs]
[Times: user=0.14 sys=0.22, real=15.73 secs]
2015-05-31T18:25:51.288+0200: 83410.045: [GC cleanup 334M->326M(512M), 0.2836092 secs]
[Times: user=0.00 sys=0.00, real=0.28 secs]
2015-05-31T18:25:51.570+0200: 83410.328: [GC concurrent-cleanup-start]
2015-05-31T18:25:51.570+0200: 83410.328: [GC concurrent-cleanup-end, 0.0000669 secs]
2015-05-31T18:26:03.732+0200: 83422.482: [GC pause (G1 Evacuation Pause) (young), 0.1031257 secs]
[Parallel Time: 91.6 ms, GC Workers: 4]
[GC Worker Start (ms): Min: 83422481.7, Avg: 83422481.7, Max: 83422481.8, Diff: 0.0]
[Ext Root Scanning (ms): Min: 1.3, Avg: 1.7, Max: 2.7, Diff: 1.4, Sum: 6.9]
[Update RS (ms): Min: 0.0, Avg: 22.7, Max: 89.8, Diff: 89.8, Sum: 90.8]
[Processed Buffers: Min: 0, Avg: 7.3, Max: 15, Diff: 15, Sum: 29]
[Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.2]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Object Copy (ms): Min: 0.5, Avg: 2.4, Max: 3.4, Diff: 2.9, Sum: 9.5]
[Termination (ms): Min: 0.0, Avg: 64.7, Max: 86.3, Diff: 86.3, Sum: 258.9]
[GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
[GC Worker Total (ms): Min: 91.6, Avg: 91.6, Max: 91.6, Diff: 0.0, Sum: 366.3]
[GC Worker End (ms): Min: 83422573.3, Avg: 83422573.3, Max: 83422573.3, Diff: 0.0]
[Code Root Fixup: 0.1 ms]
[Code Root Purge: 0.0 ms]
[String Dedup Fixup: 2.1 ms, GC Workers: 4]
[Queue Fixup (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Table Fixup (ms): Min: 1.9, Avg: 1.9, Max: 1.9, Diff: 0.1, Sum: 7.7]
[Clear CT: 0.1 ms]
[Other: 9.3 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 8.8 ms]
[Ref Enq: 0.1 ms]
[Redirty Cards: 0.0 ms]
[Humongous Reclaim: 0.0 ms]
[Free CSet: 0.1 ms]
[Eden: 215.0M(215.0M)->0.0B(19.0M) Survivors: 7168.0K->6144.0K Heap: 443.6M(512.0M)->228.2M(512.0M)]
[Times: user=0.30 sys=0.01, real=0.10 secs]
2015-05-31T18:26:03.848+0200: 83422.597: [GC concurrent-string-deduplication, 160.0B->0.0B(160.0B), avg 48.3%, 0.0123951 secs]
[Last Exec: 0.0123951 secs, Idle: 38.7017788 secs, Blocked: 0/0.0000000 secs]
[Inspected: 3]
[Skipped: 0( 0.0%)]
[Hashed: 3(100.0%)]
[Known: 0( 0.0%)]
[New: 3(100.0%) 160.0B]
[Deduplicated: 3(100.0%) 160.0B(100.0%)]
[Young: 3(100.0%) 160.0B(100.0%)]
[Old: 0( 0.0%) 0.0B( 0.0%)]
[Total Exec: 2869/0.2070075 secs, Idle: 2869/83421.6719550 secs, Blocked: 13/0.0032760 secs]
[Inspected: 304496]
[Skipped: 0( 0.0%)]
[Hashed: 163711( 53.8%)]
[Known: 44808( 14.7%)]
[New: 259688( 85.3%) 21.9M]
[Deduplicated: 160470( 61.8%) 10.6M( 48.3%)]
[Young: 83549( 52.1%) 6270.8K( 57.8%)]
[Old: 76921( 47.9%) 4571.3K( 42.2%)]
[Table]
[Memory Usage: 2565.5K]
[Size: 65536, Min: 1024, Max: 16777216]
[Entries: 81061, Load: 123.7%, Cached: 6553, Added: 142396, Removed: 61335]
[Resize Count: 8, Shrink Threshold: 43690(66.7%), Grow Threshold: 131072(200.0%)]
[Rehash Count: 0, Rehash Threshold: 120, Hash Seed: 0x0]
[Age Threshold: 3]
[Queue]
[Dropped: 0]
2015-05-31T18:26:05.769+0200: 83424.518: [GC pause (G1 Evacuation Pause) (mixed), 0.2232916 secs]
[Parallel Time: 216.7 ms, GC Workers: 4]
[GC Worker Start (ms): Min: 83424518.3, Avg: 83424518.3, Max: 83424518.3, Diff: 0.0]
[Ext Root Scanning (ms): Min: 1.2, Avg: 1.6, Max: 2.6, Diff: 1.4, Sum: 6.5]
[Update RS (ms): Min: 0.0, Avg: 0.3, Max: 0.4, Diff: 0.4, Sum: 1.2]
[Processed Buffers: Min: 0, Avg: 4.3, Max: 7, Diff: 7, Sum: 17]
[Scan RS (ms): Min: 56.1, Avg: 102.3, Max: 144.4, Diff: 88.3, Sum: 409.2]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2, Sum: 0.3]
[Object Copy (ms): Min: 50.4, Avg: 97.6, Max: 157.7, Diff: 107.2, Sum: 390.2]
[Termination (ms): Min: 0.0, Avg: 14.8, Max: 19.8, Diff: 19.8, Sum: 59.1]
[GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
[GC Worker Total (ms): Min: 216.6, Avg: 216.6, Max: 216.6, Diff: 0.0, Sum: 866.5]
[GC Worker End (ms): Min: 83424734.9, Avg: 83424734.9, Max: 83424734.9, Diff: 0.0]
[Code Root Fixup: 0.1 ms]
[Code Root Purge: 0.0 ms]
[String Dedup Fixup: 1.5 ms, GC Workers: 4]
[Queue Fixup (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Table Fixup (ms): Min: 1.4, Avg: 1.4, Max: 1.4, Diff: 0.0, Sum: 5.6]
[Clear CT: 0.2 ms]
[Other: 4.8 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 0.9 ms]
[Ref Enq: 0.0 ms]
[Redirty Cards: 0.2 ms]
[Humongous Reclaim: 0.0 ms]
[Free CSet: 0.2 ms]
[Eden: 19.0M(19.0M)->0.0B(21.0M) Survivors: 6144.0K->4096.0K Heap: 247.2M(512.0M)->175.2M(512.0M)]
[Times: user=0.09 sys=0.00, real=0.22 secs]
2015-05-31T18:26:05.992+0200: 83424.742: [GC concurrent-string-deduplication, 640.0B->152.0B(488.0B), avg 48.3%, 0.0000246 secs]
[Last Exec: 0.0000246 secs, Idle: 2.1442834 secs, Blocked: 0/0.0000000 secs]
[Inspected: 6]
[Skipped: 0( 0.0%)]
[Hashed: 5( 83.3%)]
[Known: 0( 0.0%)]
[New: 6(100.0%) 640.0B]
[Deduplicated: 5( 83.3%) 488.0B( 76.3%)]
[Young: 5(100.0%) 488.0B(100.0%)]
[Old: 0( 0.0%) 0.0B( 0.0%)]
[Total Exec: 2870/0.2070321 secs, Idle: 2870/83423.8162384 secs, Blocked: 13/0.0032760 secs]
[Inspected: 304502]
[Skipped: 0( 0.0%)]
[Hashed: 163716( 53.8%)]
[Known: 44808( 14.7%)]
[New: 259694( 85.3%) 21.9M]
[Deduplicated: 160475( 61.8%) 10.6M( 48.3%)]
[Young: 83554( 52.1%) 6271.2K( 57.8%)]
[Old: 76921( 47.9%) 4571.3K( 42.2%)]
[Table]
[Memory Usage: 2564.6K]
[Size: 65536, Min: 1024, Max: 16777216]
[Entries: 81026, Load: 123.6%, Cached: 6553, Added: 142397, Removed: 61371]
[Resize Count: 8, Shrink Threshold: 43690(66.7%), Grow Threshold: 131072(200.0%)]
[Rehash Count: 0, Rehash Threshold: 120, Hash Seed: 0x0]
[Age Threshold: 3]
[Queue]
[Dropped: 0]
2015-05-31T18:26:08.157+0200: 83426.906: [GC pause (G1 Evacuation Pause) (mixed), 0.6216666 secs]
[Parallel Time: 618.5 ms, GC Workers: 4]
[GC Worker Start (ms): Min: 83426906.5, Avg: 83426906.5, Max: 83426906.5, Diff: 0.0]
[Ext Root Scanning (ms): Min: 0.3, Avg: 8.0, Max: 15.7, Diff: 15.3, Sum: 31.9]
[Update RS (ms): Min: 0.0, Avg: 4.5, Max: 8.5, Diff: 8.5, Sum: 17.9]
[Processed Buffers: Min: 0, Avg: 7.0, Max: 18, Diff: 18, Sum: 28]
[Scan RS (ms): Min: 13.4, Avg: 28.4, Max: 65.2, Diff: 51.8, Sum: 113.7]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.2]
[Object Copy (ms): Min: 532.6, Avg: 577.3, Max: 604.5, Diff: 71.9, Sum: 2309.1]
[Termination (ms): Min: 0.0, Avg: 0.2, Max: 0.3, Diff: 0.3, Sum: 0.7]
[GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
[GC Worker Total (ms): Min: 618.4, Avg: 618.4, Max: 618.4, Diff: 0.0, Sum: 2473.6]
[GC Worker End (ms): Min: 83427524.9, Avg: 83427524.9, Max: 83427524.9, Diff: 0.0]
[Code Root Fixup: 0.1 ms]
[Code Root Purge: 0.0 ms]
[String Dedup Fixup: 1.3 ms, GC Workers: 4]
[Queue Fixup (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Table Fixup (ms): Min: 1.2, Avg: 1.2, Max: 1.3, Diff: 0.1, Sum: 4.9]
[Clear CT: 0.1 ms]
[Other: 1.6 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 1.0 ms]
[Ref Enq: 0.0 ms]
[Redirty Cards: 0.0 ms]
[Humongous Reclaim: 0.0 ms]
[Free CSet: 0.2 ms]
[Eden: 21.0M(21.0M)->0.0B(21.0M) Survivors: 4096.0K->4096.0K Heap: 196.2M(512.0M)->129.4M(512.0M)]
[Times: user=0.08 sys=0.02, real=0.62 secs]
EDIT: Results after few hours:
There is a lot of pages/sec and pages input/sec and also page fault. Is this normal? Where can I set to monitor pages/sec and pages input/sec just for JVM (I only found pages faults)?
Upvotes: 4
Views: 1680
Reputation: 43042
[Times: user=0.14 sys=0.22, real=15.73 secs]
This means that it spends a lot more wall-time GCing than actual CPU time. All GCs you have tried are multi-threaded and purely CPU-bound, meaning that they should generally burn more CPU time than wall time passes.
There are two possible causes that would change that
-XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=1
Additionally, since it happened during a remark pause using -XX:-ClassUnloadingWithConcurrentMark
might fix it, but I guess it would just shift the problem to a regular GC.
And maybe it would be useful to trace how much it's actually trying to unload with -XX:+TraceClassUnloading
. Application containers like glassfish might be doing some strange things that causes lots of classes to pile up.
Edit: For monitoring you'll mostly want to keep an eye on free physical memory (minus caches), CPU load, page in/out/faults. Ideally monitor paging at at the process level, since it does not concern the JVM whether another process is waiting for the disks.
As for CMS vs. G1: that's probably unrelated to your problem.
Upvotes: 2
Reputation: 7438
I'd guess you're barking up the wrong tree - I suspect Garbage Collection is not your problem...
You're only running a 512 MiB heap - to me, a long pause with a heap that size would be 1 or 2 seconds. Major pauses in the milliseconds are possible with huge (32 GiB) heaps.
I'm expecting that the issue is actually with your server - either the other applications you've mentioned are using enough memory to push your Java process (which will be around 50% bigger than the heap) into swap / virtual memory - or you're running your application in a virtualised environment (possibly with memory overcommitment / memory balooning issues).
As a really crude indicator, any GC algorithm should be able to churn though 100 MiB of heap per second - so if you're seeing worse than that, look elsewhere for the cause of the issue.
In this case, I believe GC is the symptom, not the problem.
Upvotes: 3