Reputation: 71
We noticed occasional full GC’s with G1 garbage collector with concurrent-mark overflow. Once, there is a concurrent-mark-reset-for-overflow, this overflow will continue in the next concurrent mark phases. Eventually, it leads to the full GC since the concurrent mark seems no longer working.
We have four machines running the same Apache Storm based application with the same data traffic. Only one of the machines has this experience once in a week.
Is this related to the bug: ‘G1 does not expand marking stack when mark stack overflow happens during concurrent marking’ https://bugs.openjdk.java.net/browse/JDK-8065402
According to the suggestion from the above page, we doubled the concurrent mark threads from 4 to 8 and our heap size from 8GB to 16GB. However, the full GC still happens and the only difference is that the occurrences are delayed.
Any other suggestions?
Here's the GC log:
Java HotSpot(TM) 64-Bit Server VM (25.65-b01) for linux-amd64 JRE(1.8.0_65b17),
built on Oct 6 2015 17:16:12 by "java_re" with gcc 4.3.0 20080428 (Red Hat 4.3.0-8)
Memory: 4k page, physical 529167668k(69283408k free), swap 33554424k(33552380k free)
CommandLine flags: -XX:ConcGCThreads=8 -XX:G1ReservePercent=20 -XX:GCLogFileSize=104857600
-XX:InitialHeapSize=17179869184 -XX:InitiatingHeapOccupancyPercent=45 -XX:MaxGCPauseMillis=100
-XX:MaxHeapSize=17179869184 -XX:NumberOfGCLogFiles=10 -XX:ParallelGCThreads=30
-XX:+PrintAdaptiveSizePolicy -XX:PrintFLSStatistics=2 -XX:+PrintGC -XX:+PrintGCApplicationStoppedTime
-XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC
-XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseG1GC -XX:+UseGCLogFileRotation
...
...
2016-04-13T22:06:37.254-0400: 19839.175: [GC concurrent-root-region-scan-start]
2016-04-13T22:06:37.313-0400: 19839.234: [GC concurrent-root-region-scan-end, 0.0592966 secs]
2016-04-13T22:06:37.313-0400: 19839.234: [GC concurrent-mark-start]
2016-04-13T22:06:38.569-0400: 19840.490: [GC concurrent-mark-reset-for-overflow]
...
2016-04-13T22:06:42.810-0400: 19844.731: [GC concurrent-mark-reset-for-overflow]
...
2016-04-13T22:11:19.253-0400: 20121.175: [GC concurrent-mark-reset-for-overflow]
...
...
...
2016-04-14T01:58:17.254-0400: 33739.176: [GC concurrent-mark-reset-for-overflow]
...
2016-04-14T01:58:36.957-0400: 33758.878: [Full GC (Allocation Failure)
Upvotes: 7
Views: 5275
Reputation: 38910
From oracle g1_gc blog:
GC concurrent-mark-reset-for-overflow
: This indicates that the global marking stack had became full and there was an overflow of the stack. Concurrent marking detected this overflow and had to reset the data structures to start the marking again
So increasing -XX:MarkStackSize
is one quick win.
Few observation from your VM parameters:
-XX:MaxGCPauseMillis, -XX:G1HeapRegionSize,-XX:ParallelGCThreads=n, -XX:ConcGCThreads=n
Leave everything else to default values. 8 MB
. Make sure that you maintain 2048
regions.-XX:MaxGCPauseMillis
. If 200ms
is unrealistic for 16 GB heap, set this value as properly. Official documentation page recommends the way to set XX:ParallelGCThreads=n, -XX:ConcGCThreads=n
depending on number of cores in your machine.
-XX:ParallelGCThreads=n
: Sets the value of the STW worker threads. Sets the value of n to the number of logical processors. The value of n is the same as the number of logical processors up to a value of 8.
-XX:ConcGCThreads=n
:Sets the number of parallel marking threads. Sets n to approximately 1/4 of the number of parallel garbage collection threads (ParallelGCThreads).
Revisit -XX:InitialHeapSize=17179869184 -XX:InitiatingHeapOccupancyPercent=45 -XX:G1ReservePercent=20
parameters. Leave them to default values unless you have pressing need to change them.
Visit this page for better understanding of G1GC logs.
Upvotes: 8