haipeng31
haipeng31

Reputation: 637

Some points about implementing a garbage collector

I'm trying to implement a simple programming language. I want to make the user of it not have to manage the memory, so I decided to implement a garbage collector. The simplest way I can think of after checking out some material is like this:

There are two kinds of heap zones. The first is for storing big objects(bigger than 85,000 bytes), the other is for small objects. In the following I use BZ for the first, SZ for the second.

The BZ uses the mark and sweep algorithm, because moving a big object is expensive. I don't compact, so there will be fragmentation.

The SZ uses generations with mark-compact. There are three generations: 0, 1, and 2. Allocation requests go directly to generation 0, and when generation 0 is full, I will do garbage collection on it, the survivals will be promoted to generation 1. generation 1 and generation 2 will also do garbage collection when full.

When the virtual machine starts, it will allocate a big memory from the OS to be used as a heap zone in the virtual machine The BZ and every generation in SZ will occupy a fixed portion of memory, and when an allocation request can't be satisfied, the virtual machine will give an error OTM (out of memory). This has a problem: when the virtual machine starts, even getting the program to run on it should need only a little memory, but it still uses a lot. A better way will be for the virtual machine get a small amount of memory from the OS, and then when the program needs more memory the virtual machine will get more from the OS. I am going to allocate a larger memory for the generation 2 in SZ, and then copy all the things in generation 2 to the new memory zone. And do the same thing for the BZ.

The other problem occurs when the BZ is full and SZ is empty, I would be silly not be able to satisfy a big object allocation request even though we in fact have enough free heap size for the big object in SZ. How to deal with this problem?

Upvotes: 2

Views: 236

Answers (1)

Imposter
Imposter

Reputation: 2686

I am trying to understand your methodology . Since you didnt mention your strategy completely, i am having some assumption.

NOTE : Following is my hypothetical analysis and may not be practically possible .So please skip answer if you don't have time .

Your are trying to use Generational GC with changes ; There are classifications of 2 types

(1) big size objects BZ and

(2)small size objects SZ .

SZ perform generational GC with compaction( CMS )

From above understanding we know that the SZG2 has long lived objects .I am expecting that GC in szG2 is not as frequent as SZG1 or SZG0 since long lived object generally tend to live longer so less dead collection and size of SZG2 will be more as time lapses, so GC'ing takes lot of time traversing all elements ,so doing frequent GC on SZG2 is less productive(long GC spike ,so notable delay for user) compared to that of SZG1 or SZG0 .

And similarly for BZ there might be large memory requirement (as big object occupy more space) . So in order to address you query

"The other problem occurs when the BZ is full and SZ is empty, I would be silly not be able to satisfy a big object allocation request even though we in fact have enough free heap size for the big object in SZ. How to deal with this problem?"

Since you said that " when the program needs more memory the virtual machine will get more from the OS"

I have a small idea, may not be productive or may not be possible to implement and completely dependent on your implementation of GCHeap structure . Let your virtual machine allocate memory like follows

enter image description here

Coming to the possibility (i borrowed idea from "memory segments of program" as shown below) below is possible at low level .

enter image description here

As shown in above figure a GCHeap structure has to be defined in such a way that SZG0 and BZ expand towards each other .For implementing GCHeap structure mentioned in figure a, figure b we need to have proper convention of memory growth in zones SZG[0-2] size and BZ .

So if you want to divide your heap for application into multiple heaps then you can pile figure A over figure B to decrease fragmentation (when i say fragmentation it means "when the BZ is full and SZ is empty, I would be silly not be able to satisfy a big object allocation request even though we in fact have enough free heap size for the big object in SZ." ).

So effective structure will be

            B
            |
            B
            |
            B
            |
            B
            |
            A

Now its completely depends on heuristics to decide whether to consider GCHeap data structure in multiple GCHeap structures like GCHeapA , GCHeapB or take it as single heap based on requirement .

If you don't want to have multiple heaps then you can use figure A with small correction by Setting base address of **SZG2** to top of heap

The key reason behind Figure a is as follows : we know that SZG0 get GC'ed frequently so it can have more free space compared to SZG1 and SZG2 ,since dead objects are removed and survived object gets moved to SZG1 and SZG2 .So if allocation of BZ is more it can grow towards SZG0 .

In the figure a base address of SZG1 and SZG2 are contigious because SZG2 is more prone to out of memory error as old generation object tend to live longer and GC'ing doesnt sweep much (NOTE : it is just my assumption and opinion ) so SZG2 is kept bound outwards .

Upvotes: 4

Related Questions