Why remark phase is needed on concurrent GC

Question

Concurrent GC needs remark phase. The role of remark phase is to mark modified objects during concurrent mark phase. But I think if we only mark the newly created objects during concurrent mark phase, there's no need to execute remark phase.

remark phase is needed because of the modified objects. The modification can be two type. One is new object creation and the other is modified pointer to another object. New object problem can be solved easily if we mark the newly created objects. And modified pointer to another object is not a problem in fact. Because

Dead object can not revive

Dead object means that no one could point that object. How can they revive? So modified pointer should point to already marked objects. It means there's no need to perform remark.

Someone could say that, "Marking new object on its creation is too expensive. So they cannot be marked during concurrent mark phase and that's the reason why remark phase is needed". It seems like reasonable. But this can arise another question. How could remark without traverse every objects from the root? If remark phase should traverse every objects from the root, the works done by concurrent mark phase is useless. Or if remark phase traverse only modified objects, the information that which object is modified should be saved somewhere. I think it could be much expensive than just marking .

Am I wrong? It should be wrong. But I have no idea which point is wrong.

maaartinus · Accepted Answer

And modified pointer to another object is not a problem in fact. Because

Dead object can not revive

They really can't but do you know which objects are dead? No! Why?

You don't know it after the initial mark phase as you look only at the thread stacks and don't follow references.

You don't know if after the concurrent mark phase as the following may happen:

A thread reads the field a.x and stores its value in its register (or on its stack or elsewhere).
Then this thread set a.x = null (or something else).
The GC comes and sees null there.
Then the thread restores a.x to its previous value.

Now, the GC has missed the object a.x points to. While the above scenario is not very common, it may happen and there are more realistic (and more complicated) scenarios.

So it's necessary to look at the modified memory again, which is the remark phase. Fortunately, not the whole memory must be scanned again, as a card table gets used.

I'm afraid this (otherwise nice) explanation is a bit misleading in this point:

The remark phase is a stop-the-world. CMS cannot correctly determine which objects are alive (mark them live), if the application is running concurrently and keeps changing what is live.

The threads do change what is live, but they also change what you can see as being live. And that's the problem.

This article states it rather clearly:

Part of the work in the remark phase involves rescanning objects that have been changed by an application thread (i.e., looking at the object A to see if A has been changed by the application thread so that A now references another object B and B was not previously marked as live).

I'd say: When you search one room after another, you may miss your glasses when children move them around.

A note concerning the scenario

I'm pretty sure, the above scenario is possible, it's just not exactly what a program usually does. For a pretty realistic example, consider

void swap(Object[] a, int i, int j) {
    Object tmp = a[i];
    a[i] = a[j];
    // Now the original reference a[i] is in a register only.
    a[j] = tmp;
}

Why remark phase is needed on concurrent GC

Answers (2)

A note concerning the scenario

Related Questions