How exactly are objects removed from the heap?

Question

I'm learning about garbage collection in Java and all the different generations. I now know that when an object is not marked after a run, it is swept. I'm curious though: what exactly is that sweeping process?

My understanding is it removes dead objects from the heap and the heap is stored in the RAM, but how exactly does the removal work? Does the OS expose a method to remove the data at a specific block from the RAM?

I'm guessing it does the equivalent of what the free() function does in C, so I suppose this is a question of what that function does exactly.

rzwitserloot · Accepted Answer

The JVM has a pluggable architecture for GC systems. So, what the JVM actually does depends entirely on which GC engine you plugin. It's a little much to explain the details for everything in an SO post, but the basic gist:

very rarely does the JVM invoke free or malloc. The JVM is its own little system, it mallocs a boatload and will then keep it, generally forever (java is primarily designed for servers and assumes whatever it gets is reserved for it and won't be needed by other applications in basis. It'll try to play nice if not on servers but that's not what the GCs are optimized for) - if your VM created 2GB worth of stuff at some point and later on almost all of that can be GCed, it will be*, but the VM is keeping the 2GB it got from the OS. The theory being: Hey, I had to juggle 2GB worth of objects earlier in my lifetime, I'm a server, odds are I'll be asked to do a job that causes me to have to juggle 2GB again sooner rather than later, and free/malloc take time, so why bother with that? I'll just keep 2GB of pages to myself, knowing it's free to use to write heap stuff into.
GCs tend to work in 'pages' - every new created object goes at the next available bunch o bytes on the page. Once an entire page is full, it will remember which objects got out of 'eden' (the idea that if you make an object within the confines of a single method, and the object never got assigned to any field anywhere, and that method is now done, that means the object is neccessarily also done. Generally for this 'first generation' (the first time after an object was created, that the page is full and garbage collection is relevant), the JVM works in reverse: It knows which objects are NOT already done and copies those over to a new page, all at 'the front' of that new page, and then just considers the full page now empty. In this sense, java can collect fast garbage for literally zero cost, and JVMs outperform C code based on malloc and free because of this. It also means that you SHOULD be making tons of garbage. It's better to have a generally immutable type that you continually make new instances of, instead of having a single longer lived instance. contained short lived garbage is usually free.
For the next generations (as objects live on, they keep being pushed up a 'generation', with successive generations being left alone and uncollected for longer periods of time. The gist being: The longer an object has already been alive, the lower the chance that it is now eligible. Objects that live long (are not eligible for collection) tend to continue to not be eligible. Other than that first generation, GCs tend to work in the 'positive mark and sweep' fashion, starting with all 'alive' objects and making an expanding graph of objects that you can reach via them, thus making those also 'alive'. But once garbage has been identified, the principle is the same: Make a new page, copy the non-garbage over, then mark the old page as free without overwriting it with zeroes, because there is no need (java does not allow pointer arithmetic, thus, no risk that code sees the un-overwritten bytes in the page).

If you know how harddisk defraggers worked in times of yore - it's something like that.

Remember though: This is an oversimplified view that is picking off a few details from various GC impls. Java VMs have pluggable memory architectures and there are widespread differences between how GCs actually work. The technologies named here (copy-alive-out-then-reuse-page, free-collect-fast-garbage, generational garbage collection, and mark-and-sweep) aren't universally used by every GC system available for the JVM.

*) Fast garbage, if your GC is generational with an 'eden' generation system baked in, is generally always near continually applied, but once you've moved on from the eden generation, don't assume that garbage is going to get collected today. If the JVM has plenty of heap left then there is absolutely no reason to spend cycles on collection right this moment. It's completely feasible that something that is eligible for GC sticks around in heap for DAYS because the server is mostly idling anyway, and all objects made are fast garbage and don't make it past the eden gen. This is one of the many reasons why it is inappropriate to treat the output of your OS's indication of memory taken by the VM, or the VM's own reporting of memory availability, as particularly informative, and also why you should absolutely not ever write finalizers. Assume garbage is never cleaned up in time for whatever you needed that finalizer to do.

How exactly are objects removed from the heap?

Answers (1)

Related Questions