Reputation: 1287
I was watching video Google IO 2008 - Dalvik Virtual Machine Internals to understand how Dalvik VM works and why those people has preferred Dalvik VM over JVM for android. I found that android uses separate memory for Garbage information about the objects , opposed to the JVM where we have mark bits(bits telling whether object is able for garbagfe collection or not) together with objects.
Can anybody tell me in detail what are the advantages and disadvantages of having separate memory for marks bits and not having separate memory for mark bits ?
I was unable to get this difference by watching video.
Upvotes: 2
Views: 1405
Reputation: 20382
Separate mark bits work by having an array of bits where each bit represents an address in the heap that can start an object. For example, suppose the heap is 65536 bytes and all objects are aligned at 16 byte boundaries, then there are 4096 addresses in the heap that can be the start of an object. This means the array needs to contain 4096 bits, which can be efficiently stored as 512 bytes or 64 64bit sized unsigned integers.
In-object mark bits works by having one bit of each header of each object be set to 1 if the object is marked and 0 otherwise. Note that this requires each object to have a dedicated header area. Runtimes such as the JVM and .NET all add headers to objects so you essentially get the space for the mark bit for free.
But it doesn't work for conservative collectors which don't have full control of the environment they are running in, such as the Boehm GC. They can misidentify integers as pointers, so for them modifying anything in the mutators data heap is risky.
Mark & sweep garbage collection is divided into two phases: marking and sweeping. Marking using in-object mark bits is straight-forward (pseudo-code):
if not obj.is_marked():
obj.mark()
mark_stack.append(obj)
Using a separate array for storing mark bits, we have to convert the objects address and size to indices in the bit array and set the corresponding bits to 1:
obj_bits = obj.size_in_bytes() / 16
bit_idx = (obj - heap.start_address()) / 16
if not bitarr.bit_set(bit_idx):
bitarr.set_range(bit_idx, obj_bits)
mark_stack.append(obj)
So in our example, if an object is 128 bytes long, 8 bits will be set in the bit array. Clearly, using in-object mark bits is much simpler.
But separate mark bits gain some momentum when sweeping. Sweeping involves scanning through the whole heap and finding continuous regions of memory which is unmarked and therefore can be reclaimed. Using in-object mark bits, it would roughly look like this:
iter = heap.start_address()
while iter < heap.end_address():
# Scan til the next unmarked object
while iter.is_marked():
iter.unmark()
iter += iter.size()
if iter == heap.end_address():
return
# At an unmarked block
start = iter
# Scan til the next marked object
while iter < heap.end_address() and not iter.is_marked():
iter += iter.size()
size = iter - start
# Reclaim the block
heap.reclaim(start, size)
Note how the iteration jumps from object to object in the iter += iter.size()
lines. This means that the sweep phase running time is proportional to the total number of live and garbage objects.
Using separate mark bits, you would do roughly the same loop except that large swathes of garbage objects would be flown over without "stopping" on each of them.
Consider the 65536 heap again. Suppose it contains 4096 objects that are all garbage. Iterating the 64 64bit integers in the mark bits array and seeing that they are all 0 is obviously very fast. Therefore the sweeping phase can potentially be much faster with separate mark bits.
But there is another wrinkle! In any mark and sweep collector, the running time is dominated by the mark phase and not the sweep phase which is usually very quick. So the verdict is still out. Some prefer separate mark bits, other prefer in-object ones. To the best of my knowledge, no one has yet been able to show which approach is superior to the other.
Upvotes: 1
Reputation:
Some advantages of a separate bitmap:
fork()
on a Unix system, a separate bitmark makes better use of copy-on-write: Pages containing objects might remain shared.Some advantages of in-object mark bits:
Upvotes: 3