Apeiron
Apeiron

Reputation: 704

Divide and conquer of large objects for GC performance

At my work we're discussing different approaches to cleaning up a large amount of managed ~50-100MB memory.There are two approaches on the table (read: two senior devs can't agree) and not having the experience the rest of the team is unsure of what approach is more desirable, performance or maintainability.

The data being collected is many small items, ~30000 which in turn contains other items, all objects are managed. There is a lot of references between these objects including event handlers but not to outside objects. We'll call this large group of objects and references as a single entity called a blob.

Approach #1: Make sure all references to objects in the blob are severed and let the GC handle the blob and all the connections.

Approach #2: Implement IDisposable on these objects then call dispose on these objects and set references to Nothing and remove handlers.

The theory behind the second approach is since the large longer lived objects take longer to cleanup in the GC. So, by cutting the large objects into smaller bite size morsels the garbage collector will processes them faster, thus a performance gain.

So I think the basic question is this: Does breaking apart large groups of interconnected objects optimize data for garbage collection or is better to keep them together and rely on the garbage collection algorithms to processes the data for you?

I feel this is a case of pre-optimization, but I do not know enough of the GC to know what does help or hinder it.

Edit: to add emphasis the "blob" of memory is not a single large object, it is many small objects allocated separately.

A little more background in case it is helpful. we had 'leaks' in that objects were not getting GCed. Both approaches solve the leak issue but at this point it is a debate between which is more appropriate.

Upvotes: 5

Views: 665

Answers (7)

supercat
supercat

Reputation: 81149

Any object which has a finalizer should be Disposed of if at all possible before being abandoned. Abandoning an object with a finalizer should be considered the worst case from a GC performance standpoint.

Beyond that, even in the absence of finalizers, it's possible to construct scenarios in which it's better to simply detach a large blob from the rest of the world and let it die, and it's possible to construct scenarios in which it's better to break the large object apart. Generally, simply letting the large object die would be optimal except for two caveats, which may favor breaking it up:

  1. If parts of the blob are long-lived, it's unlikely that any part of it, no matter how recently allocated, will be collectible before the next level-2 garbage collection. By contrast, if one blows apart all the references holding the blob together, the parts that were most recently allocated may be eligible for a level-0 or level-1 collection. If enough of the objects within the blob are comparatively new, the effort required to kill the references held by the longer-lived parts may be less than the work the GC would otherwise do to keep the newer objects around until the next level-2 collection.
  2. If a stray reference exists to part of the blob and the blob is left intact, that reference may keep the whole blob alive. By contrast, if the blob is blown apart, the stray reference may only keep alive a small portion of it. This may be a good or bad thing. If the alternative would be keeping the whole blob alive, keeping only a small portion is probably better. On the other hand, if the choice is between finding a problem and fixing it (eliminating the stray reference) versus not finding the problem, the former might be better.

I personally dislike abandoning events in any case where an outside object might hold a reference to the publisher. Pro-active clean-up seems like a better habit.

Upvotes: 0

Hans Passant
Hans Passant

Reputation: 941417

Neither approach makes sense. The GC has no trouble with detecting circular references or complicated object graphs. No point in setting references to null. IDisposable does nothing to improve GC perf.

If there's any lead in how you solved the problem, it is in setting events to null. They have a knack for keeping objects referenced if they are implemented "backwards". In other words: keeping the originator of the event alive and tearing down its clients. Unsubscribing then has to be done explicitly.

But trying to guess at this was the wrong approach to start with. Any decent memory profiler would have shown you what reference was keeping a graph alive.

Upvotes: 3

jpalecek
jpalecek

Reputation: 47762

Approach #2: Implement IDisposable on these objects then call dispose on these objects and set references to Nothing and remove handlers.

...

The theory behind the second approach is since the large longer lived objects take longer to cleanup in the GC. So, by cutting the large objects into smaller bite size morsels the garbage collector will processes them faster, thus a performance gain.

I think this is not true; garbage collectors' costs typically depend on number of living objects and their references, and the number of dead objects (depending on the type of GC). Once you don't need an object (or objects) and cut the reference paths from root objects to it/them, the number of references between the "garbage" objects doesn't matter. So, I'd say, just be sure there won't be dangling references from outside the "blobs" and you'll be OK.

Upvotes: 1

Reed Copsey
Reed Copsey

Reputation: 564413

The second approach is faulty - It assumes that implementing IDisposable will impact the garbage collector.

Unfortunately, IDisposable has nothing to do with garbage collection. It is purely about releasing unmanaged resources. It sounds like your 2nd senior dev is trying to be a bit "too clever" for their own good.

The first approach should be fine. As soon as you stop referencing the "blob", every object within the blog will become unrooted, and it should get cleaned up. This may happen at some indeterminate time after you release the reference (unless you explicitly tell the GC to collection, which I don't recommend). The interdependencies will be handled correctly for you.

Suppose that implementing IDisposable and cleaning up the internal references could, theoretically, speed up the collection process. If there was a (small) net gain, the time spent processing all of that data will most likely outweigh any gains in the GC - and it really outside of your business concern.

However, I suspect it would actually slow down the garbage collector overall, not speed it up. Breaking up the data set into lots of objects will not help the GC run faster - it still has to track through the live references, which are no different in this situation.

Upvotes: 9

vittore
vittore

Reputation: 17579

take a look at http://msdn.microsoft.com/en-us/magazine/cc534993.aspx

Upvotes: 1

Stephen Kellett
Stephen Kellett

Reputation: 3236

Microsoft imply that Dispose is faster than Finalize if you want performance for objects that hold unmanaged resources (file handles, GDI handles, etc). I don't think that is what you are trying to acheive (you haven't said anything about unmanaged resources).

Let the GC do its thing (as I type this, two other answers appear, saying the same thing, pretty much).

Upvotes: 1

zneak
zneak

Reputation: 138051

The IDisposable interface has nothing to do with garbage collection.

It happens that some objects (like file streams) hold resources that can be precious (since the file descriptor limit for a process is usually much lower than the memory limit on modern operating systems). However, the garbage collector does not acknowledge them; and as thus, if you're running out of file descriptors but still have plenty of memory, the garbage collector might not run.

The IDisposable interface sets a mechanism by which you can rest assured that all unmanaged resources associated with a managed object will be released once the object actually becomes useless, and not only when the garbage collector decides to run.

Consequently, making objects IDisposable will not impact how objects are garbage-collected. Even using the Dispose method to clear all references will have little to no impact on garbage collector runs; just clearing the references to your blob will let all your smaller objects become unrooted at once.

Upvotes: 2

Related Questions