SimonC
SimonC

Reputation: 6718

Large Object Heap friendly IDictionary

We have an application that holds large numbers of objects in several Dictionarys, some of which grow continually during the lifetime of the app (trading application with lots of instruments and continually growing orders/trades).

We are having problems with OutOfMemoryExceptions due to fragmentation of the large object heap.

To counter this, I've tried to write a 'large' dictionary that is implemented as a two level dictionary where all the leaf dictionaries are not large enough to be allocated on the LOH. I've used a consistent hashing algorithm to avoid having to rehash the entire dictionary when a single bucket becomes too large. The consistent hashing 'circle' is a TreeDictionary from the C5 collections library.

Are there any better data structures (or perhaps better implementations of the one I described) for C#?

This is the implementation for the 'large' dictionary: https://gist.github.com/956621

I understand it's not foolproof as neither the LOH heap threshold is in the specification, nor the size of each Dictionary entry or scaling algorithm. However, this is currently the best I can think of to avoid the application blowing up mid-day.

Upvotes: 10

Views: 2164

Answers (2)

Rick Sladkey
Rick Sladkey

Reputation: 34250

A dictionary is an unfortunate data structure when it is the largest one in your application. The hash table is often doubled in size when it becomes too full and that requires 150% overallocation during the resizing, right at the critical time. The hash table works superbly enough when it's gigantic, but it requires consecutive allocation which stresses heap algorithms.

You can diminish these disadvantages with multi-level hash tables, for example using a byte of the hash code as an index into 256 hash tables. This adds some overhead for sure, but more importantly, this and other strategies are filled with peril by fiddling with the randomness such as it of the hash codes that you get and potentially making things much, much worse performance-wise. Using this approach requires a good theoretical foundation and solid empirical testing. But it can work.

Another strategy is to preallocate the biggest data structure for the worst case and allocate it early. No fine-grained allocation necessary but now you face the spectre of catastrophic failure if it should ever run out. It's an option.

Upvotes: 3

Euphoric
Euphoric

Reputation: 12849

I think this calls for a change of algorithm.

From what I heard and understood, GC is quite good at packaging and defragmenting memory. So your problem stems from the simple fact that you save too much data into the memory.

How much data do you keep in memory?

Did you think about using a database? A compact one might be enough.

Or simply tell your client that to correctly run your application, he/she needs 16 GB of memory. And if your application needs all those 16 GB of memory, then there is definitely something wrong.

Looking at your problem from a different side and after reading your edit I got a question: How big are your objects? Or do they contain long lists or arrays? How often do you remove/add those objects?

I think the problem might not be in the dictionary itself, but objects that are too big and are removed/added too often. Maybe using some kind of caching or pool might be profitable. And if you use lists, then create those lists with preallocated.

And maybe using immutable structs instead of mutable classes can ease the fragmentation.

Upvotes: 1

Related Questions