Why does CMS collector collect root references from young generation on Initial Mark phase?

As far as I know CMS collector collects old generation and it works in conjunction with ParNew collector (which is applied for collecting the young generation). It is not so easy for me to clearly understand how CMS works but here is how I saw it:

1) Initial Mark. Looking for root references. Since the collector is an oldgen collector it should only scan old generation.

2) Concurrent-Mark When all the root references has been found it's time to start concurrent marking. All the objects transitively reachable from the objects marked in first phase are marked in this phase.

3) Concurrent Preclean the gc looks at the objects in CMS heap which got updated by promotions from young generation or new allocations or got updated by mutators while we were doing the concurrent marking in the previous concurrent marking phase. [Please confirm that 1 the only purpose of this phase is to make some part of the job which has to be done on the next phase (remark) ? 2) There is some proces that is looking which references was changed during Concurrent-Mark phase. Please tell me whether I am right on these two items]

4) Remark the gc stops the world and then looks at the objects in CMS heap which got updated by promotions from young generation or new allocations or got updated by mutators while we were doing the concurrent preclean.

But today I saw this article

Initial mark During initial mark CMS should collect all root references to start marking of old space. This includes: References from thread stacks, References from young space. References from stacks are usually collected very quickly (less than 1ms), but time to collect references from young space depends on size of objects in young space. Normally initial mark starts right after young space collection, so Eden space is empty and only live objects are in one of survivor space. Survivor space is usually small and initial mark after young space collection often takes less than millisecond. But if initial mark is started when Eden is full it may take quite long (usually longer than young space collection itself). Once CMS collection is triggered, JVM may wait some time for young collection to happen before it will start initial marking. JVM configuration option –XX:CMSWaitDuration= can be used to set how long CMS will wait for young space collection before start of initial marking. If you want to avoid long initial marking pauses, you should configure this time to be longer than typical period of young collections in your application.

Remark Most of marking is done in parallel with application, but it may not be accurate because application may modify object graph during marking. When concurrent marking is finished; garbage collector should stop application and repeat marking to be sure that all reachable objects marked as alive. But collector doesn’t have to traverse through whole object graph; it should traverse only reference modified since start of marking (actually since start pre clean phase). Card table (see card marking write barrier) is used to identify modified portions of memory in old space, but thread stacks and young space should be scanned once again. Usually most time of remark phase is spent of scanning young space. This time will be much shorter if we collect garbage in young space before starting of remark. We can instruct JVM to always force young space collection before CMS remark. Use JVM parameter –XX:+CMSScavengeBeforeRemark to enable this option. Even is young space is empty, remark phase still have to scan through modified references in old space, this usually takes time close to normal young collection pause (due scanning of old space done during young collection is similar to scanning required for remark).

http://blog.griddynamics.com/2011/06/understanding-gc-pauses-in-jvm-hotspots_02.html

Don't understand why CMS need to scan young generation. Why is it needed for old generation garbage collection?

Upvotes: 2

Answers (2)

Alexey Ragozin

Reputation: 8379

Java heap is split into two parts which are collected independently: old space and young space.

To collect either space you need to find all inbound references outside of space. They are:

local variables from stack
literal embedded into JIT compiled code blobs
ALL references from another space

There is no difference old collection should scan young space and young collection should scan old space.

Card table write barrier is used not to scan whole old space for each young collection (only small portion of old space contains links to young, write barrier helps track this regions).

But there is no card table for young space, so old collection should scan entire memory range.

PS I'm author of article you have referenced, you can find few more GC related articles at my blog

Upvotes: 2

Guntram Blohm

Reputation: 9819

You might have classes with circular references, say a class A that has a reference to class B, and B has a backreference to A. If you have objects a and b that belong to these classes, and reference each other, gc must remove them when you drop the last reference to them from "outside". The situation can, of course, be much more complicated with the reference loop containing more elements. So gc has to check which elements are reachable from some root, and which ones are referenced, but not reachable, and should be collected.

Now if you have, somewhere in your code

Object a=new A(new B(new C(new D())))

the constructors can take some time before a gets assigned. But you don't want gc to remove the newly created D, just because C's constructor takes a while to run, and a didn't get assigned yet. So you need to scan the young generation as well, to catch objects that are too young to be referenced from the heap.

Upvotes: 2

Why does CMS collector collect root references from young generation on Initial Mark phase?

Answers (2)

Related Questions