Reputation: 137
A .Net Framework 4.7 program iterates over a list of 50,000+ items and processes them, like so:
var taskList = new List<Task<bool>>();
foreach (var item in itemList)
{
taskList.Add(Process(item));
}
Task.WaitAll(taskList.ToArray());
The executable runs on a single virtual box with 8 virtual processors, and takes 3-4 hours to run.
Based on timestamp analysis, it appears that processing is spread over these 3-4 hours with a couple hundred milliseconds between the items' processing start times.
Dynatrace is showing logical threads capped at around 25 during execution.
What would be the best way to achieve a higher degree of parallelization, to bring down the total processing time?
Not looking at a major overhaul at this point; only at getting Tasks to spin up more efficiently.
Thank you.
Upvotes: 0
Views: 50
Reputation: 156524
The code you've shown doesn't appear to be doing any parallel processing. If Process
is truly async then you're probably getting a high level of concurrency from it, but not parallelism.
A shorter version of what you're doing now looks like this:
var taskList = itemList
.Select(item => Process(item))
.ToArray();
Task.WaitAll(taskList);
If Process
spends a lot of time CPU-bound, you could introduce parallelism like this:
var taskList = itemList
.AsParallel() // Do it in parallel
.Select(item => Process(item))
.ToArray();
Task.WaitAll(taskList);
You can play with different degrees of parallelism to arrive at a number of concurrent threads that seems to provide the best throughput for the work you're doing, like this:
var taskList = itemList
.AsParallel()
.WithDegreeOfParallelism(50) // 50 concurrent threads
.Select(item => Process(item))
.ToArray();
Task.WaitAll(taskList);
But if Process
is spending most of its time waiting for results to come back from an asynchronous operation, it probably won't make a significant difference to make it run in parallel threads. In that case you'll need to investigate what's making your individual operations take so long. For example, maybe you're running into a connection pool limit on network requests to the same destination. Maybe a REST service is throttling your requests. Or maybe your hard drive is thrashing. Hard to know without understanding the details.
Upvotes: 1