Muds
Muds

Reputation: 4116

Multithreading/Concurrent strategy for a network based task

I am not pro in utilizing resources to the best hence am seeking the best way for a task that needs to be done in parallel and efficiently.

We have a scenario wherein we have to ping millions of system and receive a response. The response itself takes no time in computation but the task is network based.

My current implementation looks like this -

Parallel.ForEach(list, ip =>
{
    try
    {
        // var record = client.QueryAsync(ip);
        var record = client.Query(ip);
        results.Add(record);
    }
    catch (Exception)
    {
        failed.Add(ip);
    }
});

I tested this code for

I need to process close to 20M queries, what strategy should i use in order to speed this up further

Upvotes: 1

Views: 259

Answers (1)

TheGeneral
TheGeneral

Reputation: 81493

Here is the problem

Parallel.ForEach uses the thread pool. Moreover, IO bound operations will block those threads waiting for a device to respond and tie up resources.

  • If you have CPU bound code, Parallelism is appropriate;
  • Though if you have IO bound code, Asynchrony is appropriate.

In this case, client.Query is clearly I/O, so the ideal consuming code would be asynchronous.

Since you said there was an async verison, you are best to use async/await pattern and/or some type of limit on concurrent tasks, another neat solution is to use ActionBlock Class in the TPL dataflow library.


Dataflow example

public static async Task DoWorkLoads(List<IPAddress> addresses)
{
   var options = new ExecutionDataflowBlockOptions
                     {
                        MaxDegreeOfParallelism = 50
                     };

   var block = new ActionBlock<IPAddress>(MyMethodAsync, options);

   foreach (var ip in addresses)
      block.Post(ip);

   block.Complete();
   await block.Completion;

}

...

public async Task MyMethodAsync(IpAddress ip)
{

    try
    {
        var record = await client.Query(ip);
        // note this is not thread safe best to lock it
        results.Add(record);
    }
    catch (Exception)
    {
        // note this is not thread safe best to lock it
        failed.Add(ip);
    }
}

This approach gives you Asynchrony, it also gives you MaxDegreeOfParallelism, it doesn't waste resources, and lets IO be IO without chewing up unnecessary resources

*Disclaimer, DataFlow may not be where you want to be, however i just thought id give you some more information


Demo here

update

I just did some bench-marking with Parallel.Foreaceh and DataFlow

Run multiple times 10000 pings

Parallel.Foreach = 30 seconds

DataFlow = 10 seconds

Upvotes: 4

Related Questions