eeyore22
eeyore22

Reputation: 13

TPL's Parallel.For creates and destroys threads too much. How do I keep the number of threads constant?

I have some code written to use Parallel.For with thread local variables. It's basically a summation of a large array, but the elements of the array are calculated explicitly in the for loop at the same time they are being summed.

The problem I'm having is that my thread-local variables are very, very heavy objects. It's not uncommon for them to take up 200 mb of memory. I noticed my program's memory usage would spike to 2 gb, then the GC would drop it back down to 200 mb and up and down it went, which indicated a lot of temporaries were being allocated. Since I need several thread-local variables, I've wrapped them in a struct object. This allowed me to add a Console.WriteLine in the constructor and I saw a lot of my objects being created whereas I only expected one construction per core on my machine. How can I force it to create exactly (numberOfCores) threads and keep only those around until the end?

I added

ParallelOptions options = new ParallelOptions();
options.MaxDegreeOfParallelism = 2;

which helped only slightly. I still get too many struct constructions. It looks like there is something I can do with options.TaskScheduler, but I can't seem to understand what the extent of its power is. It looks like I can roll my own, which is almost scary. I don't want to do that if possible.

Here is the relevant section of code in my program.

ParallelOptions options = new ParallelOptions();
options.MaxDegreeOfParallelism = 2;

Parallel.For<ThreadLocalData>(0, m, options,
    // Thread local variable initialization
    () => new ThreadLocalData(new DenseMatrix(r * r, r * r, 0),
                              new DenseMatrix(r * r, r * r, 0),
                              new DenseMatrix(r, r, 0)),
    // Per-thread routine
    (row, loop, threadLocalData) =>
    {
        threadLocalData.kronProductRight.Clear();
        for (int column = 0; column < n; ++column)
        {
            if ((int)E[row, column] == 1)
                threadLocalData.kronProductRight.Add(Yblocks[column], threadLocalData.kronProductRight);
        }
        MathNetAdditions.KroneckerProduct(Xblocks[row], threadLocalData.kronProductRight, threadLocalData.kronProduct);
        threadLocalData.subtotal.Add(threadLocalData.kronProduct, threadLocalData.subtotal);
        return threadLocalData;
    },
    (threadLocalData) =>
    {
        lock (mutex)
        A.Add(threadLocalData.subtotal, A);
    }
);

Upvotes: 1

Views: 548

Answers (2)

Kenneth Ito
Kenneth Ito

Reputation: 5261

Check out this article http://blogs.msdn.com/b/pfxteam/archive/2010/10/21/10079121.aspx Especially the parts about Parallel.For having a performance issue when the initialize delegate is expensive.

From looking at the code above its hard to tell, but it looks like you should be able to separate the computational/data parts of your ThreadLocalData from the stateful/mutating aspects of it? Ideally, you would pass a reference to an immutable version of ThreadLocalData to whatever is crunching your numbers. That way, no matter what, you're just dealing with one instance.

Upvotes: 1

Ohad Schneider
Ohad Schneider

Reputation: 38116

I haven't gotten into the thick of your question (and it seems you are asking the wrong question as phoog pointed out), but to answer your specific question:

How can I force it to create exactly (numberOfCores) threads and keep only those around until the end?

You have a scheduler that does exactly this:

http://blog.abodit.com/2010/11/task-parallel-library-a-scheduler-with-priority-apartment-state-and-maximum-degree-of-parallelism/

Upvotes: 0

Related Questions