Reputation: 4116
I am not pro in utilizing resources to the best hence am seeking the best way for a task that needs to be done in parallel and efficiently.
We have a scenario wherein we have to ping millions of system and receive a response. The response itself takes no time in computation but the task is network based.
My current implementation looks like this -
Parallel.ForEach(list, ip =>
{
try
{
// var record = client.QueryAsync(ip);
var record = client.Query(ip);
results.Add(record);
}
catch (Exception)
{
failed.Add(ip);
}
});
I tested this code for
I need to process close to 20M queries, what strategy should i use in order to speed this up further
Upvotes: 1
Views: 259
Reputation: 81493
Here is the problem
Parallel.ForEach
uses the thread pool. Moreover, IO bound operations will block those threads waiting for a device to respond and tie up resources.
In this case, client.Query
is clearly I/O, so the ideal consuming code would be asynchronous.
Since you said there was an async verison, you are best to use async
/await
pattern and/or some type of limit on concurrent tasks, another neat solution is to use ActionBlock Class in the TPL dataflow library.
Dataflow example
public static async Task DoWorkLoads(List<IPAddress> addresses)
{
var options = new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 50
};
var block = new ActionBlock<IPAddress>(MyMethodAsync, options);
foreach (var ip in addresses)
block.Post(ip);
block.Complete();
await block.Completion;
}
...
public async Task MyMethodAsync(IpAddress ip)
{
try
{
var record = await client.Query(ip);
// note this is not thread safe best to lock it
results.Add(record);
}
catch (Exception)
{
// note this is not thread safe best to lock it
failed.Add(ip);
}
}
This approach gives you Asynchrony, it also gives you MaxDegreeOfParallelism
, it doesn't waste resources, and lets IO be IO without chewing up unnecessary resources
*Disclaimer, DataFlow may not be where you want to be, however i just thought id give you some more information
update
I just did some bench-marking with Parallel.Foreaceh and DataFlow
Run multiple times 10000 pings
Parallel.Foreach = 30 seconds
DataFlow = 10 seconds
Upvotes: 4