Anders Forsgren
Anders Forsgren

Reputation: 11111

How to reduce time spent in GC

I'm creating a desktop application that has a compute-heavy operation that potentially runs for several seconds. Obviously there is a need to minimize the time of this operation. The operation is fairly easy to parallellize (individual subtasks), and each subtask takes around 50ms on a single thread. On multiple threads, each subtask takes 4-5 times as long because 40-50% time is spent in GC, effectively cancelling the speedup completely.

So I need to give the GC less work. My first thought was to try to find which type of object was being garbage collected the most, but I realized that although I often do memory profiling, I had never searched for a pattern like this. Usually a look at heap snapshots, or differences between heap snapshots, but these show objects that are alive, not the objects that were created and disposed between those snapshots. So that is my first question: what is the easiest way to find which types are created and garbage collected the most? I tried looking for method call counts to see if some constructor was called suspiciously often, but all objects created in millions were only small struct types. These should have no effect on GC even if boxed if I understand things correctly?

The algorithm creates hundreds of thousands of individual result point objects. These of course aren't supposed to be gc'd because they represent the output of the operation. But it leads me to my second question: is the time spent in GC mostly dependent on the total number of objects or mostly depending on the number of objects actually collected? Should I try to limit the number of result objects and instead use fewer but larger result objects?

Edit: I found the time spent in GC by using the VS 2010 concurrency visualizer. Also, in the parallel piece of code most sections of blocked threads were waiting for gc

Edit: I should clarify that the performance problem is because the execution is effectively serialized on the workstation GC. See for example the performance problem described in this post.

http://blogs.msdn.com/b/hshafi/archive/2010/06/17/case-study-parallelism-and-memory-usage-vs2010-tools-to-the-rescue.aspx

I can't do anything about the garbage collector blocking my threads (and I don't think I want the server GC for a desktop app, correct?). So in order to get a linear speedup for this operation, I need to reduce the number of times the GC is invoked. Most of the time wasted is actually wasted by other threads blocked waiting for one thread to do GC.

Upvotes: 8

Views: 2054

Answers (4)

smirkingman
smirkingman

Reputation: 6368

Old question, but for those that stumble on it...

I had exactly the same problem and fixed it permanently by setting server-mode garbage collection http://msdn.microsoft.com/en-us/library/ms229357(v=vs.110).aspx.

In app.config add:

  <runtime>
     <gcServer enabled="true" />
  </runtime>

That already speeded my code up by an order of magnitude, with no side-effects that I could find.

If you know exactly where you're generating a lot of GCs, I also found that LowLatency http://msdn.microsoft.com/en-us/library/system.runtime.gclatencymode(v=vs.110).aspx brought my GCs down to a single generation-1 GC:

GC.Collect ' pre-emptively collect before time-critical region
Dim oldmode As GCLatencyMode = GCSettings.LatencyMode
RuntimeHelpers.PrepareConstrainedRegions()

Try
    GCSettings.LatencyMode = GCLatencyMode.LowLatency

    ' Work that allocates tons of memory here

Finally
    GCSettings.LatencyMode = oldmode

End Try

(The PrepareConstrainedRegions hopefully ensures that the Finally block is always executed, but I'm not entirely sure this is correct).

Upvotes: 1

Spence
Spence

Reputation: 29360

Perhaps you should look at increasing the cache hits between your objects.

So rather than creating new struct points and then performing calculations in lists/enumerables, Have you tried allocating a fixed array of points and then continuously reusing the points. That way you allocate the objects only once, perform your calculations and then return. You will benefit from hot cache and you will not suffer any GC if you are able to completely reuse the array.

Upvotes: 1

Tony Hopkinson
Tony Hopkinson

Reputation: 20330

These result point objects. As in the standard struct Point? Can't say from here, but have you tried pre-allocating the space for them. Most of your GC calls could be allocating memory to them, that's a lot of effort, doing them in larger blocks, or even in one go if the amount can be calculated should give you a boost.

Another option might be trundling in to unsafe code, given you can gain that permission on the workstation. Don't know hoe you have your points layed out, but might be some future in just allocating a block of memory and then ripping through it with pointer arithmetic.

Upvotes: 0

Mitchel Sellers
Mitchel Sellers

Reputation: 63136

Personally, if your tasks as taking only 50ms to execute, the overhead of thread creation etc, is going to take more more time than your actual jobs, which is what it appears that you are seeing. So you might not be able to get too far into it.

As for seeing what is out there, the best tools that I've used are ANTS Profiler (Memory and Performance). From there you can see objects in memory, and differences between points in time as well as "number of executions" which should get you what you want.

Upvotes: 4

Related Questions