Reputation: 1612
I am trying to understand parallel programming and I would like my async
methods to run on multiple threads. I have written something but it does not work like I thought it should.
Code
public static async Task Main(string[] args)
{
var listAfterParallel = RunParallel(); // Running this function to return tasks
await Task.WhenAll(listAfterParallel); // I want the program exceution to stop until all tasks are returned or tasks are completed
Console.WriteLine("After Parallel Loop"); // But currently when I run program, after parallel loop command is printed first
Console.ReadLine();
}
public static async Task<ConcurrentBag<string>> RunParallel()
{
var client = new System.Net.Http.HttpClient();
client.DefaultRequestHeaders.Add("Accept", "application/json");
client.BaseAddress = new Uri("https://jsonplaceholder.typicode.com");
var list = new List<int>();
var listResults = new ConcurrentBag<string>();
for (int i = 1; i < 5; i++)
{
list.Add(i);
}
// Parallel for each branch to run await commands on multiple threads.
Parallel.ForEach(list, new ParallelOptions() { MaxDegreeOfParallelism = 2 }, async (index) =>
{
var response = await client.GetAsync("posts/" + index);
var contents = await response.Content.ReadAsStringAsync();
listResults.Add(contents);
Console.WriteLine(contents);
});
return listResults;
}
I would like RunParallel function to complete before "After parallel loop" is printed. Also I want my get posts method to run on multiple threads.
Any help would be appreciated!
Upvotes: 2
Views: 1555
Reputation: 43545
Take a look at this: There Is No Thread
When you are making multiple concurrent web requests it's not your CPU that is doing the hard work. It's the CPU of the web server that is serving your requests. Your CPU is doing nothing during this time. It's not in a special "Wait-state" or something. The hardware inside your box that is working is your network card, that writes data to your RAM. When the response is received then your CPU will be notified about the arrived data, so it can do something with them.
You need parallelism when you have heavy work to do inside your box, not when you want the heavy work to be done by the external world. From the point of view of your CPU, even your hard disk is part of the external world. So everything that applies to web requests, applies also to requests targeting filesystems and databases. These workloads are called I/O bound, to be distinguished from the so called CPU bound workloads.
For I/O bound workloads the tool offered by the .NET platform is the asynchronous Task
. There are multiple APIs throughout the libraries that return Task
objects. To achieve concurrency you typically start multiple tasks and then await
them with Task.WhenAll
. There are also more advanced tools like the TPL Dataflow library, that is build on top of Tasks
. It offers capabilities like buffering, batching, configuring the maximum degree of concurrency, and much more.
Upvotes: 1
Reputation: 10708
What's happening here is that you're never waiting for the Parallel.ForEach
block to complete - you're just returning the bag that it will eventually pump into. The reason for this is that because Parallel.ForEach
expects Action
delegates, you've created a lambda which returns void
rather than Task
. While async void
methods are valid, they generally continue their work on a new thread and return to the caller as soon as they await
a Task, and the Parallel.ForEach
method therefore thinks the handler is done, even though it's kicked that remaining work off into a separate thread.
Instead, use a synchronous method here;
Parallel.ForEach(list, new ParallelOptions() { MaxDegreeOfParallelism = 2 }, index =>
{
var response = client.GetAsync("posts/" + index).Result;
var contents = response.Content.ReadAsStringAsync().Result;
listResults.Add(contents);
Console.WriteLine(contents);
});
If you absolutely must use await
inside, Wrap it in Task.Run(...).GetAwaiter().GetResult()
;
Parallel.ForEach(list, new ParallelOptions() { MaxDegreeOfParallelism = 2 }, index => Task.Run(async () =>
{
var response = await client.GetAsync("posts/" + index);
var contents = await response.Content.ReadAsStringAsync();
listResults.Add(contents);
Console.WriteLine(contents);
}).GetAwaiter().GetResult();
In this case, however, Task.run
generally goes to a new thread, so we've subverted most of the control of Parallel.ForEach; it's better to use async
all the way down;
var tasks = list.Select(async (index) => {
var response = await client.GetAsync("posts/" + index);
var contents = await response.Content.ReadAsStringAsync();
listResults.Add(contents);
Console.WriteLine(contents);
});
await Task.WhenAll(tasks);
Since Select
expects a Func<T, TResult>
, it will interpret an async
lambda with no return
as an async Task
method instead of async void
, and thus give us something we can explicitly await
Upvotes: 1