Reputation: 2722
I have an out of memory problem that I can't seem to solve with Mono/C# when I use parallel tasks.
Background: I have a data processing intensive application that reads chunks of data from a file. Each chunk is read as a byte array, which is passed to a new StreamReader instance so that it can be consumed/processed by a thread working as a parallel task. On the Microsoft CLR, this works perfectly, and the memory stays beneath ~200 MB while processing this file.
However, on Mono, instead of the amount of the memory for the process staying in the same range throughout the processing of the file, it increases linearly until it gives an out of memory error after exceeding the 32 bit address space limit. I cannot understand why, and am trying to solve this problem.
I used the profiler and heap shot tools to figure out what was consuming so much memory. It appears the byte arrays that are filled with data from chunks of the file stay around longer than they should (though are occasionally collected), and as a result the program runs out of memory. I have tried to figure out what is keeping a reference to them using the heap shot profiler, but it list several "unknown" types and I have no idea what this means. I I have tried to ensure that everything is disposed/set-to-null after use, and clearly in the MS runtime they can be and are collected. If anyone knows how to interpret these unknowns in heapshot or further diagnose/solve this issue it would be greatly appreciated. For reference, the heap shot screen view and a code snippet of the task are shown below.
//Run parallel tasks: the enumerator in this foreach statement produces byte[] types
//and feeds them to a stream reader that it "yields"
Parallel.ForEach(FQP.GetStreamReaderForSequences(700000),FR =>
{
//next code bits that process the FR variable
//(which is a streamreader wrapping a byte[])
....
//Now I dispose of the streamreader
FR.Dispose();
FR = null;
//This didn't help, but ideally there should be no more references to the byte[] type here.
GC.Collect();
});
Upvotes: 1
Views: 407
Reputation: 2722
Mono by default grows the size of the partition each size a new task is requested. Therefore, it will run out of memory if a large dataset is being enumerated in a parallel query. As a result, you must create your own custom partitioner as in here:
http://msdn.microsoft.com/en-us/library/vstudio/dd997416(v=vs.100).aspx
Upvotes: 1