Darren Young
Darren Young

Reputation: 11100

C# garbage collection

I have a business app that I have written, that effectively recurses through a directory structure looking for specific Excel files, and stores their addresses. It then loops through these files and parses them by creating a DocumentParser object for each file, this is done one at a time, and not async. The software seems to be very stable, so much so that the business would like to run it to recurse through a massive directory containing upwards of 10000 relevant Excel files.

My question is, as I am creating a new DocumentParser object each time, will the GC be effective enough to discard each of the objects when they go out of scope, ie when that Excel sheet has been parsed, or is there a way I can monitor this and where necessary manually do a GC? I've never had to deal with such large amounts of data before, generally only testing it on a maximum of 40-50 Excel files at a time.

Thanks.

Upvotes: 1

Views: 349

Answers (4)

Niall Connaughton
Niall Connaughton

Reputation: 16117

I would leave the GC to its business. 10,000 objects is not really much work for the GC. And it's likely the cost of the GC work will be much lower than the cost of the Excel work. So it's not worth complicating your design to tweak things for the GC. If you end up with so many files to process that your application can't finish in time, it's most likely going to be the speed of the Excel processing holding you up.

However one note which may be relevant: if the DocumentParser is using unmanaged memory in its work with the Excel file, you can use GC.Add/RemoveMemoryPressure to indicate to the GC the real added cost when opening the file. If you didn't write the DocumentParser yourself, the author may already be doing this.

The issue here is that you may have a managed object that costs something in the order of 100 bytes, which allocates a large amount of unmanaged memory when it does Excel work. The GC will have no way of knowing this, so these methods help notify the GC that there is more memory pressure than it was aware of. This may change its behaviour in how/when it decides to collect, which may lead to the application maintaining a lower memory footprint. If the application's memory usage balloons out over time, then you may start seeing some slow downs from length garbage collection and possibly paging on the machine (depending on how much memory you have). You'll want to keep an eye on its memory usage to make sure it's not leaking memory as it processes - a memory profiler may be helpful there.

Upvotes: 2

Russ Clarke
Russ Clarke

Reputation: 17909

Yes and no - The GC is effective enough to release when it needs to, but you can't generally be sure when that is.

There is a way to force a GC collection but it's generally considered to be bad practise in production code because of the effects of forcing a stack walk when it's not required is worse then using a bit of extra memory until the GC decides it needs to free resources to allocate more objects.

Upvotes: 1

PVitt
PVitt

Reputation: 11770

The GC is a very complex piece of software. And the GC is at least the only one that knows when garbage collection is necessary. So my advice is to leave the GC on it's own.

Additionally: The GC will handle these masses objects. Perhaps you will recognize a decrease of performance. If this is a problem you can try to optimize your code. But not premature.

Upvotes: 4

Stilgar
Stilgar

Reputation: 23591

You don't need to manually call the GC unless you are holding some very large resource which is not the case in your situation. The GC will tweak itself with every call and if you call it manually you will just disrupt its internal profiling data.

BTW GC can collect stuff not only when it goes out of scope but also after its last usage (i.e. while it is still in scope but the variable is not used anymore).

Upvotes: 1

Related Questions