Reputation: 31
I'm running a large instance of tomcat with the following version of java:
java version "1.6.0_20"
OpenJDK Runtime Environment (IcedTea6 1.9.7) (6b20-1.9.7-0ubuntu1~10.04.1
OpenJDK 64-Bit Server VM (build 19.0-b09, mixed mode)
and with the following parameter set:
-Xms13152m
-Xmx13152m
-Xmn768m
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=60
-XX:+CMSIncrementalMode
-XX:+CMSIncrementalPacing
-XX:CMSIncrementalDutyCycleMin=0
-XX:CMSIncrementalDutyCycle=10
-XX:+DisableExplicitGC
with GC debug statements enabled. Every few hours I see behavior in which a minor GC occurs and the application is stopped for a long time, but the GC itself does not appear to be taking that much time:
{Heap before GC invocations=392 (full 74):
par new generation total 707840K, used 698252K [0x00000004bfa00000, 0x00000004efa00000, 0x00000004efa00000)
eden space 629248K, 99% used [0x00000004bfa00000, 0x00000004e607de48, 0x00000004e6080000)
from space 78592K, 87% used [0x00000004ead40000, 0x00000004ef0a5370, 0x00000004efa00000)
to space 78592K, 0% used [0x00000004e6080000, 0x00000004e6080000, 0x00000004ead40000)
concurrent mark-sweep generation total 12681216K, used 10877603K [0x00000004efa00000, 0x00000007f5a00000, 0x00000007f5a00000)
concurrent-mark-sweep perm gen total 50260K, used 30106K [0x00000007f5a00000, 0x00000007f8b15000, 0x0000000800000000)
13120.154: [GC 13120.154: [ParNew Desired survivor size 40239104 bytes, new threshold 1 (max 4) - age 1: 41912720 bytes, 41912720 total: 698252K->41387K(707840K), 0.1239080 secs] 11575856K->10987714K(13389056K) icms_dc=20 , 0.1239930 secs] [Times: user=0.00 sys=0.00, real=0.12 secs]
Heap after GC invocations=393 (full 74):
par new generation total 707840K, used 41387K [0x00000004bfa00000, 0x00000004efa00000, 0x00000004efa00000)
eden space 629248K, 0% used [0x00000004bfa00000, 0x00000004bfa00000, 0x00000004e6080000)
from space 78592K, 52% used [0x00000004e6080000, 0x00000004e88eace0, 0x00000004ead40000)
to space 78592K, 0% used [0x00000004ead40000, 0x00000004ead40000, 0x00000004efa00000)
concurrent mark-sweep generation total 12681216K, used 10946327K [0x00000004efa00000, 0x00000007f5a00000, 0x00000007f5a00000)
concurrent-mark-sweep perm gen total 50260K, used 30106K [0x00000007f5a00000, 0x00000007f8b15000, 0x0000000800000000)
}
Total time for which application threads were stopped: 32.1614890 seconds
In this case I'd expect the GC to stop the application for, at most, .12 seconds. Does anyone have any idea why there is such a huge discrepancy between the application stopped time and the GC time?
UPDATE I've included a longer snippet from the GClog:
{Heap before GC invocations=201 (full 18):
par new generation total 707840K, used 639557K [0x00000004c4e00000, 0x00000004f4e00000, 0x00000004f4e00000)
eden space 629248K, 100% used [0x00000004c4e00000, 0x00000004eb480000, 0x00000004eb480000)
from space 78592K, 13% used [0x00000004eb480000, 0x00000004ebe917e8, 0x00000004f0140000)
to space 78592K, 0% used [0x00000004f0140000, 0x00000004f0140000, 0x00000004f4e00000)
concurrent mark-sweep generation total 12681216K, used 6634693K [0x00000004f4e00000, 0x00000007fae00000, 0x00000007fae00000)
concurrent-mark-sweep perm gen total 47232K, used 28977K [0x00000007fae00000, 0x00000007fdc20000, 0x0000000800000000)
2011-04-12T18:42:16.450+0000: 12770.352: [GC 12770.352: [ParNew
Desired survivor size 40239104 bytes, new threshold 15 (max 15)
- age 1: 3701392 bytes, 3701392 total
- age 2: 517360 bytes, 4218752 total
- age 3: 2113840 bytes, 6332592 total
- age 4: 3489008 bytes, 9821600 total
: 639557K->12278K(707840K), 0.0948620 secs] 7274251K->6646972K(13389056K) icms_dc=0 , 0.0949550 secs] [Times: user=0.00 sys=0.00, real=0.10 secs]
Heap after GC invocations=202 (full 18):
par new generation total 707840K, used 12278K [0x00000004c4e00000, 0x00000004f4e00000, 0x00000004f4e00000)
eden space 629248K, 0% used [0x00000004c4e00000, 0x00000004c4e00000, 0x00000004eb480000)
from space 78592K, 15% used [0x00000004f0140000, 0x00000004f0d3dbf0, 0x00000004f4e00000)
to space 78592K, 0% used [0x00000004eb480000, 0x00000004eb480000, 0x00000004f0140000)
concurrent mark-sweep generation total 12681216K, used 6634693K [0x00000004f4e00000, 0x00000007fae00000, 0x00000007fae00000)
concurrent-mark-sweep perm gen total 47232K, used 28977K [0x00000007fae00000, 0x00000007fdc20000, 0x0000000800000000)
}
Total time for which application threads were stopped: 0.0952670 seconds
Application time: 5.0191420 seconds
Total time for which application threads were stopped: 0.0001670 seconds
Application time: 2.2198130 seconds
Total time for which application threads were stopped: 0.0004480 seconds
Application time: 5.5295710 seconds
Total time for which application threads were stopped: 0.0001950 seconds
Application time: 0.0097140 seconds
Total time for which application threads were stopped: 0.0000530 seconds
Application time: 7.1299730 seconds
Total time for which application threads were stopped: 0.0001610 seconds
Application time: 12.0198160 seconds
Total time for which application threads were stopped: 0.0001550 seconds
Application time: 1.5298610 seconds
Total time for which application threads were stopped: 0.0001630 seconds
Application time: 0.0097650 seconds
Total time for which application threads were stopped: 0.0000590 seconds
Application time: 0.1199330 seconds
Total time for which application threads were stopped: 0.0000550 seconds
Application time: 10.0998880 seconds
Total time for which application threads were stopped: 0.0003490 seconds
Application time: 1.0093980 seconds
Total time for which application threads were stopped: 0.0003040 seconds
Application time: 0.6099830 seconds
Total time for which application threads were stopped: 0.0001700 seconds
Application time: 0.0298490 seconds
Total time for which application threads were stopped: 0.0000930 seconds
Application time: 0.2025080 seconds
Total time for which application threads were stopped: 0.0023430 seconds
Application time: 0.0190510 seconds
Total time for which application threads were stopped: 0.0003010 seconds
Application time: 0.1793310 seconds
Total time for which application threads were stopped: 0.0002150 seconds
Application time: 0.0560830 seconds
Total time for which application threads were stopped: 0.0010000 seconds
Application time: 1.7072960 seconds
{Heap before GC invocations=202 (full 18):
par new generation total 707840K, used 641302K [0x00000004c4e00000, 0x00000004f4e00000, 0x00000004f4e00000)
eden space 629248K, 99% used [0x00000004c4e00000, 0x00000004eb447fd8, 0x00000004eb480000)
from space 78592K, 15% used [0x00000004f0140000, 0x00000004f0d3dbf0, 0x00000004f4e00000)
to space 78592K, 0% used [0x00000004eb480000, 0x00000004eb480000, 0x00000004f0140000)
concurrent mark-sweep generation total 12681216K, used 6634693K [0x00000004f4e00000, 0x00000007fae00000, 0x00000007fae00000)
concurrent-mark-sweep perm gen total 47232K, used 29039K [0x00000007fae00000, 0x00000007fdc20000, 0x0000000800000000)
2011-04-12T18:43:04.052+0000: 12817.954: [GC 12817.955: [ParNew
Desired survivor size 40239104 bytes, new threshold 4 (max 15)
- age 1: 36162128 bytes, 36162128 total
- age 2: 2531664 bytes, 38693792 total
- age 3: 420840 bytes, 39114632 total
- age 4: 2052544 bytes, 41167176 total
- age 5: 3484040 bytes, 44651216 total
: 641302K->44375K(707840K), 0.2037750 secs] 7275996K->6679069K(13389056K) icms_dc=0 , 0.2039060 secs] [Times: user=0.00 sys=0.00, real=0.21 secs]
Heap after GC invocations=203 (full 18):
par new generation total 707840K, used 44375K [0x00000004c4e00000, 0x00000004f4e00000, 0x00000004f4e00000)
eden space 629248K, 0% used [0x00000004c4e00000, 0x00000004c4e00000, 0x00000004eb480000)
from space 78592K, 56% used [0x00000004eb480000, 0x00000004edfd5f88, 0x00000004f0140000)
to space 78592K, 0% used [0x00000004f0140000, 0x00000004f0140000, 0x00000004f4e00000)
concurrent mark-sweep generation total 12681216K, used 6634693K [0x00000004f4e00000, 0x00000007fae00000, 0x00000007fae00000)
concurrent-mark-sweep perm gen total 47232K, used 29039K [0x00000007fae00000, 0x00000007fdc20000, 0x0000000800000000)
}
Total time for which application threads were stopped: 0.2043140 seconds
Application time: 0.0001040 seconds
Total time for which application threads were stopped: 0.0000890 seconds
Application time: 4.0399400 seconds
{Heap before GC invocations=203 (full 18):
par new generation total 707840K, used 673623K [0x00000004c4e00000, 0x00000004f4e00000, 0x00000004f4e00000)
eden space 629248K, 100% used [0x00000004c4e00000, 0x00000004eb480000, 0x00000004eb480000)
from space 78592K, 56% used [0x00000004eb480000, 0x00000004edfd5f88, 0x00000004f0140000)
to space 78592K, 0% used [0x00000004f0140000, 0x00000004f0140000, 0x00000004f4e00000)
concurrent mark-sweep generation total 12681216K, used 6634693K [0x00000004f4e00000, 0x00000007fae00000, 0x00000007fae00000)
concurrent-mark-sweep perm gen total 47232K, used 29040K [0x00000007fae00000, 0x00000007fdc20000, 0x0000000800000000)
2011-04-12T18:44:23.258+0000: 12897.160: [GC 12897.160: [ParNew
Desired survivor size 40239104 bytes, new threshold 1 (max 15)
- age 1: 46278472 bytes, 46278472 total
- age 2: 23234248 bytes, 69512720 total
- age 3: 2144024 bytes, 71656744 total
- age 4: 391464 bytes, 72048208 total
: 673623K->78592K(707840K), 0.1607580 secs] 7308317K->6752580K(13389056K) icms_dc=0 , 0.1608520 secs] [Times: user=0.00 sys=0.00, real=0.16 secs]
Heap after GC invocations=204 (full 18):
par new generation total 707840K, used 78592K [0x00000004c4e00000, 0x00000004f4e00000, 0x00000004f4e00000)
eden space 629248K, 0% used [0x00000004c4e00000, 0x00000004c4e00000, 0x00000004eb480000)
from space 78592K, 100% used [0x00000004f0140000, 0x00000004f4e00000, 0x00000004f4e00000)
to space 78592K, 0% used [0x00000004eb480000, 0x00000004eb480000, 0x00000004f0140000)
concurrent mark-sweep generation total 12681216K, used 6673988K [0x00000004f4e00000, 0x00000007fae00000, 0x00000007fae00000)
concurrent-mark-sweep perm gen total 47232K, used 29040K [0x00000007fae00000, 0x00000007fdc20000, 0x0000000800000000)
}
Total time for which application threads were stopped: 75.1222720 seconds
Application time: 0.0660880 seconds
Total time for which application threads were stopped: 0.0001500 seconds
Application time: 0.0001110 seconds
Total time for which application threads were stopped: 0.0000620 seconds
Application time: 0.0000320 seconds
Total time for which application threads were stopped: 0.0000620 seconds
Application time: 0.5790490 seconds
Total time for which application threads were stopped: 0.0151370 seconds
Application time: 0.0241790 seconds
Total time for which application threads were stopped: 0.0111420 seconds
Application time: 0.1488160 seconds
Total time for which application threads were stopped: 0.0168360 seconds
Application time: 0.0237110 seconds
Total time for which application threads were stopped: 0.0008580 seconds
Application time: 0.0594260 seconds
Total time for which application threads were stopped: 0.0179450 seconds
Application time: 0.0331740 seconds
Total time for which application threads were stopped: 0.0072850 seconds
Application time: 1.2617710 seconds
{Heap before GC invocations=204 (full 18):
par new generation total 707840K, used 707840K [0x00000004c4e00000, 0x00000004f4e00000, 0x00000004f4e00000)
eden space 629248K, 100% used [0x00000004c4e00000, 0x00000004eb480000, 0x00000004eb480000)
from space 78592K, 100% used [0x00000004f0140000, 0x00000004f4e00000, 0x00000004f4e00000)
to space 78592K, 0% used [0x00000004eb480000, 0x00000004eb480000, 0x00000004f0140000)
concurrent mark-sweep generation total 12681216K, used 6673988K [0x00000004f4e00000, 0x00000007fae00000, 0x00000007fae00000)
concurrent-mark-sweep perm gen total 47232K, used 29089K [0x00000007fae00000, 0x00000007fdc20000, 0x0000000800000000)
2011-04-12T18:44:25.690+0000: 12899.592: [GC 12899.592: [ParNew
Desired survivor size 40239104 bytes, new threshold 1 (max 15)
- age 1: 42048296 bytes, 42048296 total
: 707840K->58684K(707840K), 0.1654070 secs] 7381828K->6802196K(13389056K) icms_dc=0 , 0.1655450 secs] [Times: user=0.00 sys=0.00, real=0.17 secs]
Heap after GC invocations=205 (full 18):
par new generation total 707840K, used 58684K [0x00000004c4e00000, 0x00000004f4e00000, 0x00000004f4e00000)
eden space 629248K, 0% used [0x00000004c4e00000, 0x00000004c4e00000, 0x00000004eb480000)
from space 78592K, 74% used [0x00000004eb480000, 0x00000004eedcf320, 0x00000004f0140000)
to space 78592K, 0% used [0x00000004f0140000, 0x00000004f0140000, 0x00000004f4e00000)
concurrent mark-sweep generation total 12681216K, used 6743511K [0x00000004f4e00000, 0x00000007fae00000, 0x00000007fae00000)
concurrent-mark-sweep perm gen total 47232K, used 29089K [0x00000007fae00000, 0x00000007fdc20000, 0x0000000800000000)
}
Total time for which application threads were stopped: 0.1709510 seconds
Application time: 1.0183040 seconds
Total time for which application threads were stopped: 0.0009220 seconds
Application time: 0.0209220 seconds
Total time for which application threads were stopped: 0.0009260 seconds
Application time: 0.0181760 seconds
Total time for which application threads were stopped: 0.0002780 seconds
Application time: 0.0773630 seconds
Total time for which application threads were stopped: 0.0019720 seconds
Application time: 0.0045340 seconds
Total time for which application threads were stopped: 0.0001400 seconds
Application time: 0.0245930 seconds
Total time for which application threads were stopped: 0.0004070 seconds
Application time: 0.4811530 seconds
Total time for which application threads were stopped: 0.0043100 seconds
Application time: 0.0015090 seconds
Total time for which application threads were stopped: 0.0052290 seconds
Application time: 0.0264570 seconds
Total time for which application threads were stopped: 0.0072530 seconds
Application time: 0.0135600 seconds
Total time for which application threads were stopped: 0.0121400 seconds
Application time: 0.0278510 seconds
Total time for which application threads were stopped: 0.0078060 seconds
Application time: 0.0136080 seconds
Total time for which application threads were stopped: 0.0121390 seconds
Application time: 0.0066360 seconds
Total time for which application threads were stopped: 0.0004330 seconds
Application time: 0.0107480 seconds
Total time for which application threads were stopped: 0.0001900 seconds
Application time: 0.0027230 seconds
Total time for which application threads were stopped: 0.0132610 seconds
Application time: 0.0126600 seconds
Total time for which application threads were stopped: 0.0003420 seconds
Application time: 0.5384840 seconds
{Heap before GC invocations=205 (full 18):
par new generation total 707840K, used 687932K [0x00000004c4e00000, 0x00000004f4e00000, 0x00000004f4e00000)
eden space 629248K, 100% used [0x00000004c4e00000, 0x00000004eb480000, 0x00000004eb480000)
from space 78592K, 74% used [0x00000004eb480000, 0x00000004eedcf320, 0x00000004f0140000)
to space 78592K, 0% used [0x00000004f0140000, 0x00000004f0140000, 0x00000004f4e00000)
concurrent mark-sweep generation total 12681216K, used 6743511K [0x00000004f4e00000, 0x00000007fae00000, 0x00000007fae00000)
concurrent-mark-sweep perm gen total 47232K, used 29092K [0x00000007fae00000, 0x00000007fdc20000, 0x0000000800000000)
2011-04-12T18:44:28.223+0000: 12902.125: [GC 12902.125: [ParNew
Desired survivor size 40239104 bytes, new threshold 15 (max 15)
- age 1: 5906520 bytes, 5906520 total
: 687932K->45478K(707840K), 0.2139570 secs] 7431444K->6860368K(13389056K) icms_dc=0 , 0.2142010 secs] [Times: user=0.00 sys=0.00, real=0.21 secs]
Heap after GC invocations=206 (full 18):
par new generation total 707840K, used 45478K [0x00000004c4e00000, 0x00000004f4e00000, 0x00000004f4e00000)
eden space 629248K, 0% used [0x00000004c4e00000, 0x00000004c4e00000, 0x00000004eb480000)
from space 78592K, 57% used [0x00000004f0140000, 0x00000004f2da9b30, 0x00000004f4e00000)
to space 78592K, 0% used [0x00000004eb480000, 0x00000004eb480000, 0x00000004f0140000)
concurrent mark-sweep generation total 12681216K, used 6814889K [0x00000004f4e00000, 0x00000007fae00000, 0x00000007fae00000)
concurrent-mark-sweep perm gen total 47232K, used 29092K [0x00000007fae00000, 0x00000007fdc20000, 0x0000000800000000)
}
UPDATE
I switched my application over to a multi-core machine to help alleviate the pain from the CMS collection and now, in the steady state, I don't see any pauses. However, during updates to my search index, a lot of large memory structures are generated quickly and application suffers from long STW pauses again. I'm not sure how to avoid this as those objects probably should reside in the tenured generation, I increased the size of my survivor spaces (-XX:SurvivorRatio=8) but am still observing long pauses. See below for GC log:
{Heap before GC invocations=103 (full 48):
par new generation total 921600K, used 912322K [0x000000050ce00000, 0x000000054b600000, 0x000000054b600000)
eden space 819200K, 99% used [0x000000050ce00000, 0x000000053e776380, 0x000000053ee00000)
from space 102400K, 97% used [0x000000053ee00000, 0x0000000544f7a630, 0x0000000545200000)
to space 102400K, 0% used [0x0000000545200000, 0x0000000545200000, 0x000000054b600000)
concurrent mark-sweep generation total 11264000K, used 6486203K [0x000000054b600000, 0x00000007fae00000, 0x00000007fae00000)
concurrent-mark-sweep perm gen total 48516K, used 29211K [0x00000007fae00000, 0x00000007
fdd61000, 0x0000000800000000)
2011-04-14T15:15:25.322+0000: 2358.684: [GC 2358.684: [ParNew
Desired survivor size 52428800 bytes, new threshold 1 (max 15)- age 1: 102164088 bytes, 102164088 total
: 912322K->99941K(921600K), 30.0146400 secs] 7398525K->6818473K(12185600K), 30.0147850 se
cs] [Times: user=0.00 sys=0.00, real=30.02 secs]
Heap after GC invocations=104 (full 48): par new generation total 921600K, used 99941K [0x000000050ce00000, 0x000000054b600000,
0x000000054b600000)
eden space 819200K, 0% used [0x000000050ce00000, 0x000000050ce00000, 0x000000053ee00000)
from space 102400K, 97% used [0x0000000545200000, 0x000000054b399710, 0x000000054b6000
00)
to space 102400K, 0% used [0x000000053ee00000, 0x000000053ee00000, 0x00000005452000
00) concurrent mark-sweep generation total 11264000K, used 6718531K [0x000000054b600000, 0x0
0000007fae00000, 0x00000007fae00000) concurrent-mark-sweep perm gen total 48516K, used 29211K [0x00000007fae00000, 0x00000007
fdd61000, 0x0000000800000000)}
Total time for which application threads were stopped: 30.0152850 seconds
Application time: 0.9450500 seconds
Upvotes: 3
Views: 8839
Reputation: 8476
added comment to longer GC log
I think your survivor spaces are too small, note that eden is full with survivor at 56% before and eden is not full after but survivor is at 100%. This means it is forced into a completely non concurrent tenured collection, that is reported as a young gc because it was not triggered by CMS, because it has nowhere to put the extra garbage but tenured. Only the tenured collector can do stuff to tenured and hence a full GC is triggered. The remedy is making eden bigger and/or making the survivor spaces bigger. You can use SurvivorRatio
to do this, e.g -XX:SurvivorRatio=8
means set each survivor space to be 1/8 of eden.
Stopped time is really "time the JVM spent at a safepoint", you can use PrintSafepointStatistics
to determine what was going on during the safepoint. This post has a link to a list of the operations that cause the VM to reach a safepoint which, if you peruse it, will show there are many and varied ways it could sit there doing "nothing". Don't rule out OS clock shifts either as the times are recorded via a timeofday aware timestamp.
btw pre hotspot17 those timestamps can be inaccurate. The bug is described in Bug ID:6782663 which was fixed in hs17 which was part of 6u21 (as per the release notes). Since you're on 6u20, it (the 30s) may not even be a real value.
Upvotes: 3
Reputation: 72049
There are cases where the concurrent collector will fall back to a full GC that pauses everything until complete. If I recall correctly, a small eden can trigger it if the application is creating a lot of objects that need to be collected and it can't keep up. You might want to try specifying a larger eden (or use a ratio so it scales properly with memory). You can try adding these config options to get more GC details, if you don't have them already:
-Xloggc:gc.log -XX:+PrintGCDetails -XX:+PrintGCDateStamps
I managed to eliminate the full GC events we were seeing with a large deployment by reviewing the GC log and iterating on the various options. Every application is different, however here they are in case they can help you too:
-Xms24g -Xmx24g -XX:MaxPermSize=256m -XX:NewRatio=4 -XX:SurvivorRatio=8
-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+DisableExplicitGC
-XX:+UseCMSInitiatingOccupancyOnly -XX:+CMSClassUnloadingEnabled
-XX:+CMSScavengeBeforeRemark -XX:CMSInitiatingOccupancyFraction=68
Since we're using under 32GB we also use -XX:+UseCompressedOops
to reduce the overhead of 64-bit addressing. Additionally we use -XX:+UseLargePages
for better performance, however you have to enable large page support for your OS first.
Upvotes: 1