Slobodan Savkovic
Slobodan Savkovic

Reputation: 1057

.NET GC Stalling Desktop Application - Performance Issue

I am working on a large windows desktop application that stores large amount of data in form of a project file. We have our custom ORM and serialization to efficiently load the object data from CSV format. This task is performed by multiple threads running in parallel processing multiple files. Our large project can contain million and likely more objects with many relationships between them.

Recently I got tasked to improve the project open performance which deteriorated for very large projects. Upon profiling it turned out that most of the time spent can be attributed to garbage collection (GC).

My theory is that due to large number of very fast allocations the GC is starved, postponed for a very long time and then when it finally kicks in it takes a very long time to the job. That idea was further confirmed by two contradicting facts:

  1. Optimizing deserialization code to work faster only made things worse
  2. Inserting Thread.Sleep calls at strategic places made load go faster

Example of slow load with 7 generation 2 collections and huge % of time in GC is below. Bad

Example of fast load with sleep periods in the code to allow GC some time is below. In this case wee have 19 generation 2 collections and also more than double the number of generation 0 and generation 1 collections. Good

So, my question is how to prevent this GC starvation? Adding Thread.Sleep looks silly and it is very difficult to guess the right amount of milliseconds in the right place. My other idea would be to use GC.Collect, but that also poses the difficulty of how many and where to put them. Any other ideas?

Upvotes: 2

Views: 429

Answers (2)

Slobodan Savkovic
Slobodan Savkovic

Reputation: 1057

So, it appears that this is a .NET bug rather then GC starvation. The workarounds and answers described in this question Garbage Collection and Parallel.ForEach Issue After VS2015 Upgrade apply perfectly. I got best results by switching to GC server mode.

Note however, that I am experiencing this issue in .NET 4.5.2. Will add hotfix link if there is one.

Upvotes: 0

Chris Shain
Chris Shain

Reputation: 51359

Based on the comments, I'd guess that you are doing a ton of String.Substring() operations as part of CSV parsing. Each of these creates a new string instance, which I'd bet you then throw away after further parsing it into an integer or date or whatever you need. You almost certainly need to start thinking about using a different persistence mechanism (CSV has a lot of shortcomings that you are undoubtedly aware of), but in the meantime you are going to want to look into versions of parsers that do not allocate substrings. If you dig into the code for Int32.TryParse, you'll find that it does some character iteration to avoid allocating more strings. I'd bet that you could spend an hour writing a version that takes a start and end parameter, then you can pass them the whole line with offsets and avoid doing a substring call to get the individual field values. Doing that will save you millions of allocations.

Upvotes: 1

Related Questions