rontu
rontu

Reputation: 63

Java ConcurrentMarkSweep garbage collector not happening

Environment Details: OS: Linux RedHat Java: JRE 6 Update 21

I am using following GC setting for my app.

-server -d64 -Xms8192m -Xmx8192m -javaagent:lib/instrum.jar -XX\:MaxPermSize=256m -XX\:+UseParNewGC -X\:+ParallelRefProcEnabled -XX\:+UseConcMarkSweepGC -XX\:MaxGCPauseMillis=250 -XX\:+CMSIncrementalMode -XX\:+CMSIncrementalPacing -XX\:+CMSParallelRemarkEnabled -verbose\:gc -Xloggc\:/tmp/my-gc.log -XX\:DisableExplicitGC -XX\:+PrintGCTimeStamps -XX\:+PrintGCDetails -XX\:+UseCompressedOops

With there setting, there is single Full GC at the begining of application

2.946: [Full GC 2.946: [CMS: 0K->7394K(8111744K), 0.1364080 secs] 38550K->7394K(8360960K), [CMS Perm : 21247K->21216K(21248K)], 0.1365530 secs] [Times: user=0.10 sys=0.04, real=0.14 secs] 

Which is followed by a 4-5 successful of CMS collections, But after this there is no trace of CMS in logs, there are entries on only minor collections.

379022.293: [GC 379022.293: [ParNew: 228000K->4959K(249216K), 0.0152000 secs] 7067945K->6845720K(8360960K) icms_dc=0 , 0.0153940 secs]

The heap is growing continuously and it has reached 7GB. We have to restart the application as we can not afford OOM or any kind of breakdown in production system.

I am not able to understand as to why CMS collector has stopped cleaning. Any clues/suggestions are welcome. Thanks in Advance.

====================================================================================== Updated 23rd Jan.

Thanks everyone for the responses till now. I have setup the application in test environment and tested the app with following set of JVM options:

Option #1

-server -d64 -Xms8192m -Xmx8192m -javaagent\:instrum.jar -XX\:MaxPermSize\=256m -XX\:+UseParNewGC -XX\:+UseConcMarkSweepGC -verbose\:gc -Xloggc\:my-gc.log -XX\:+PrintGCTimeStamps -XX\:+PrintGCDetails

Option #2

-server -d64 -Xms8192m -Xmx8192m -javaagent\:instrum.jar -XX\:MaxPermSize\=256m -XX\:+UseParNewGC -XX\:+UseConcMarkSweepGC -verbose\:gc -Xloggc\:my-gc.log -XX\:+DisableExplicitGC -XX\:+PrintGCTimeStamps -XX\:+PrintGCDetails

I ran the test with both settings for 2 days in parallel. These are my observations:

Option #1 The heap memory is stable but there are 90 ConcurrentMarkSweep collections and JVM spent 24 minutes. That’s too high. And I see following lines in GC logs and the pattern continues every one hour...

318995.941: [GC 318995.941: [ParNew: 230230K->8627K(249216K), 0.0107540 secs] 5687617K->5466913K(8360960K), 0.0109030 secs] [Times: user=0.11 sys=0.00, real=0.01 secs]
319050.363: [GC 319050.363: [ParNew: 230195K->9076K(249216K), 0.0118420 secs] 5688481K->5468316K(8360960K), 0.0120470 secs] [Times: user=0.12 sys=0.01, real=0.01 secs]
319134.118: [GC 319134.118: [ParNew: 230644K->8503K(249216K), 0.0105910 secs] 5689884K->5468704K(8360960K), 0.0107430 secs] [Times: user=0.11 sys=0.00, real=0.01 secs]
319159.250: [Full GC (System) 319159.250: [CMS: 5460200K->5412132K(8111744K), 19.1981050 secs] 5497326K->5412132K(8360960K), [CMS Perm : 72243K->72239K(120136K)], 19.1983210 secs] [Times: user=19.14 sys=0.06, real=19.19 secs]

I don’t see the concurrent mark and sweep logs. Does this mean CMS switched to throughput collector? If so why?

Option #2:

Since I see the Full GC (System) logs, I thought of adding -XX\:+DisableExplicitGC. But with that option the collection is not happening and the current heap size is 7.5G. What I am wondering is why CMS is doing the Full GC instead of concurrent collection.

Upvotes: 1

Views: 3112

Answers (4)

Greg Bowyer
Greg Bowyer

Reputation: 1684

CMS is running for you :P

You are using incremental mode on CMS (although really you should not bother as its likely punishing your throughput)

The icms_dc in your posted log line is a give away, the only thing that logs this in the JVM is ... the CMS collector, its saying for that GC run you did a small amount of tenure cleanup interwoven with the application.

This part of your log relates to parallel new (the give away there is the heap size)

379022.293: [GC 379022.293: [ParNew: 228000K->4959K(249216K), 0.0152000 secs]

this part is incremenatal CMS (iCMS)

7067945K->6845720K(8360960K) icms_dc=0 , 0.0153940 secs]

I would ask, why are you using iCMS, do you have a lot of Soft/Weak/Phantom references (or why are you using the ParallelRefProcEnabled flag) and have you actually seen an Out of memory, or insufferable pause.

Try backing down to CompressedOops, ParNewGC and CMS without anything else fancy and see if that works out for you.

Upvotes: 1

Peter Lawrey
Peter Lawrey

Reputation: 533530

When you set the maximum size, it allocates that amount of virtual memory immediately.

When you set the minimum size, it has already allocated the maximum size. All the minimum size does is to take minimal steps to free up memory until this maximum is reached. This could be reducing the number of full GCs because you told it to use up to 8 GB freely.

You have a lot of options turned on (some of them the default) I suggest you strip back to a minimum set as they can have odd interactions when you turn lots of.

I would start with (assuming you have Solaris)

-mx8g -javaagent:lib/instrum.jar -XX:MaxPermSize=256m -XX:+UseConcMarkSweepGC -verbose\:gc -Xloggc\:/tmp/my-gc.log -XX:+PrintGCTimeStamps -XX:+PrintGCDetails

The options -server is the default on server class machines, -XX:+UseCompressedOops is the default on recent versions of Java and -XX:MaxGCPauseMillis=250 is just a hint.

http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html

Upvotes: 0

Niranjan Bhat B
Niranjan Bhat B

Reputation: 1

I can see that the initial heap size -Xms is :8192m and max heap size is -Xmx8192m, which might be one of the reasons why GC is still waiting to start sweeping. I would suggest to decrease the heap size and then check if the GC kicks in.

Upvotes: 0

Stephen C
Stephen C

Reputation: 718826

This is a theory ...

I suspect that those CMS collections were not entirely successful. The event at 12477.056 looks like the CMS might have decided that it is not going to be able to work properly due to the 'pre-clean' step taking too long.

If that caused the CMS to decide to switch off, then I expect it will revert to using the classic "throughput" GC algorithm. And there's a good chance it would wait until the heap is full and then it would run a full GC. In short, if you'd just let it continue it would have been OK (modulo that you'd get big GC pauses every now and then.)


I suggest you run your application on a test server with the same heap size and other GC parameters, and see what happens when the server hits the limit. Does it actually throw an OOME?

Upvotes: 1

Related Questions