Reputation: 25812
I understand that in Java, if an object doesn't have any references to it any more, the garbage collector will reclaim it back some time later.
But how does the garbage collector know that an object has or has not references associated to it?
Is garbage collector using some kind of hashmap or table?
Please note that I am not asking how generally GC works. Really, I am not asking that. I am asking specifically how GC knows which objects are live and which are dead, with efficiencies.
That's why I say in my question that is GC maintain some kind of hashmap or set, and consistently update the number of references an object has?
Upvotes: 26
Views: 14852
Reputation: 589
The truth is that the garbage collector does not, in general, quickly know which objects no longer have any incoming references. And, in fact, an object can be garbage even when there are incoming references it.
The garbage collector uses a traversal of the object graph to find the objects that are reachable. Objects that are not reached in this traversal are deemed garbage, even if they are part of a cycle of references. The delay between an object being unreachable, and the garbage collector actually collecting the object, could be arbitrarily long.
Upvotes: 2
Reputation: 120868
There is no efficient way - it will still require traversal of the heap, but there is a hacky way: when the heap is divided into smaller pieces (thus no need to scan the entire heap). This is the reason we have generational garbage collectors, so that the scanning takes less time.
This is relatively "easy" to answer when your entire application is stopped and you can analyze the graph of objects. It all starts from GC roots
(I'll let you find the documentation for what these are), but basically these are "roots" that are not collected by the GC
.
From here a certain scan starts that analyzes the "live" objects: objects that have a direct (or transitive) connection to these roots, thus not reclaimable. In graph theory this is know to "color/traverse" your graph by using 3 colors: black, grey and white. White
means it is not connected to the roots, grey
means it's sub-graph is not yet traversed, black
means traversed and connected to the roots. So basically to know what exactly is dead/alive right now - you simply need to take all your heap that is white initially and color it to black. Everything that is white
is garbage. It is interesting that "garbage" is really identified by a GC
by knowing what is actually alive. There are some drawings to visualize this here for example.
But this is the simple scenario: when your application is entirely stopped (for seconds at times) and you can scan the heap. This is called a STW
- stop the world event and people hate these usually. This is what parallel collectors do: stop everything, do whatever GC has to (including finding garbage), let the application threads start after that.
What happens when you app is running and you are scanning the heap? Concurrently
? G1/CMS
do this. Think about it: how can you reason about a leaf from a graph being alive or not when your app can change that leaf via a different thread.
Shenandoah
for example, solves this by "intercepting" changes over the graph. While running concurrently with your application, it will catch all the changes and insert these to some thread local special queues, called SATB Queues
(snapshot at the begging queues); instead of altering the heap directly. When that is finished, a very short STW
event will occur and these queues will be drained. Still under the STW
what that drain has "caused" is computed, i.e. : extra coloring of the graph. This is far simplified, just FYI. G1
and CMS
do it differently AFAIK.
So in theory, the process is not really that complicated, but implementing it concurrently is the most challenging part.
Upvotes: 0
Reputation: 7944
Java has a variety of different garbage collection strategies, but they all basically work by keeping track which objects are reachable from known active objects.
A great summary can be found in the article How Garbage Collection works in Java but for the real low-down, you should look at Tuning Garbage Collection with the 5.0 Java[tm] Virtual Machine
An object is considered garbage when it can no longer be reached from any pointer in the running program. The most straightforward garbage collection algorithms simply iterate over every reachable object. Any objects left over are then considered garbage. The time this approach takes is proportional to the number of live objects, which is prohibitive for large applications maintaining lots of live data.
Beginning with the J2SE Platform version 1.2, the virtual machine incorporated a number of different garbage collection algorithms that are combined using generational collection. While naive garbage collection examines every live object in the heap, generational collection exploits several empirically observed properties of most applications to avoid extra work.
The most important of these observed properties is infant mortality. ...
I.e. many objects like iterators only live for a very short time, so younger objects are more likely to be eligible for garbage collection than much older objects.
For more up to date tuning guides, take a look at:
Incidentally, be careful of trying to second guess your garbage collection strategy, I've known many a programs performance for be trashed by over zealous use of System.gc()
or inappropriate -XX
options.
Upvotes: 6
Reputation: 500475
A typical modern JVM uses several different types of garbage collectors.
One type that's often used for objects that have been around for a while is called Mark-and-Sweep. It basically involves starting from known "live" objects (the so-called garbage collection roots), following all chains of object references, and marking every reachable object as "live".
Once this is done, the sweep stage can reclaim those objects that haven't been marked as "live".
For this process to work, the JVM has to know the location in memory of every object reference. This is a necessary condition for a garbage collector to be precise (which Java's is).
Upvotes: 13
Reputation: 115358
GC will know that object can be removed as quickly as it is possible. You are not expected to manage this process.
But you can ask GC very politely to run using System.gc()
. It is just a tip to the system. GC does not have to run at that moment, it does not have to remove your specific object etc. Because GC is the BIG boss and we (Java programmers) are just its slaves... :(
Upvotes: 2