pikkabird
pikkabird

Reputation: 137

Insufficient parallelization with Task library - possible solutions?

A .Net Framework 4.7 program iterates over a list of 50,000+ items and processes them, like so:

var taskList = new List<Task<bool>>();

foreach (var item in itemList)
{
    taskList.Add(Process(item));
}
Task.WaitAll(taskList.ToArray());

The executable runs on a single virtual box with 8 virtual processors, and takes 3-4 hours to run.

Based on timestamp analysis, it appears that processing is spread over these 3-4 hours with a couple hundred milliseconds between the items' processing start times.

Dynatrace is showing logical threads capped at around 25 during execution.

What would be the best way to achieve a higher degree of parallelization, to bring down the total processing time?

Not looking at a major overhaul at this point; only at getting Tasks to spin up more efficiently.

Thank you.

Upvotes: 0

Views: 50

Answers (1)

StriplingWarrior
StriplingWarrior

Reputation: 156524

The code you've shown doesn't appear to be doing any parallel processing. If Process is truly async then you're probably getting a high level of concurrency from it, but not parallelism.

A shorter version of what you're doing now looks like this:

var taskList = itemList
    .Select(item => Process(item))
    .ToArray();
Task.WaitAll(taskList);

If Process spends a lot of time CPU-bound, you could introduce parallelism like this:

var taskList = itemList
    .AsParallel()                    // Do it in parallel
    .Select(item => Process(item))
    .ToArray();
Task.WaitAll(taskList);

You can play with different degrees of parallelism to arrive at a number of concurrent threads that seems to provide the best throughput for the work you're doing, like this:

var taskList = itemList
    .AsParallel()
    .WithDegreeOfParallelism(50)      // 50 concurrent threads
    .Select(item => Process(item))
    .ToArray();
Task.WaitAll(taskList);

But if Process is spending most of its time waiting for results to come back from an asynchronous operation, it probably won't make a significant difference to make it run in parallel threads. In that case you'll need to investigate what's making your individual operations take so long. For example, maybe you're running into a connection pool limit on network requests to the same destination. Maybe a REST service is throttling your requests. Or maybe your hard drive is thrashing. Hard to know without understanding the details.

Upvotes: 1

Related Questions