Reputation: 2460
I am working on a program which exhaustively searches a large number of permutations and runs a test on each one to accumulate stats based on the category each item qualifies for.
The search input is generated on the fly using a method which returns an IEnumerable with a yield return so it's not accumulating the entire dataset at once, rather, it's generating it as its consumed.
This works just fine for smallish search spaces but when I get into the hundreds of billions of records to check the process of generating the test cases almost immediately exhausts the 16GB of RAM on my machine and just starts thrashing the computer with page faults before any processing can begin.
What appears to be happening is that the Parallel.ForEach is greedily attempting to enumerate the entire input set before starting to process anything.
So, my question is, how do I limit the rate at which the Parallel.ForEach requests input data from my test case generator?
I've tried using the ChunkPartitioner in the Samples for Parallel Programming with the .NET Framework with the idea that maybe the ForEach would stop asking for more partitions when it had fully saturated the thread pool with work but that does not appear to be the case.
I have tried searching all over the internet for an explanation of how Parallel.ForEach consumes input and ways I can influence this process but I can't find anything other than ways to partition the input set.
Am I just approaching this problem wrong? Is there an alternative pattern that would work better for this type of problem?
Upvotes: 1
Views: 198
Reputation: 9587
I am seeing two (potential) problems here. One being that the default enumerable partitioner is, indeed, greedy (meaning that it tries to materialise items which are way ahead in the queue). This is easy to fix using EnumerablePartitionerOptions
in .NET 4.5:
var partitioner = Partitioner.Create(
EnumerateItems(), // IEnumerable<T>
EnumerablePartitionerOptions.NoBuffering
);
Parallel.ForEach(partioner, new ParallelOptions { MaxDegreeOfParallelism = 4 }, i =>
{
// Process item.
});
The second problem might manifest itself if your work is not CPU-bound ultimately causing Parallel.ForEach
to ramp up the number of worker threads, which is addressed by specifying MaxDegreeOfParallelism
to prevent it from saturating the thread pool.
Upvotes: 4