f0rt
f0rt

Reputation: 1931

How to use Parallel.ForEach with Thread-Local state?

Problem: I saw 2 implementations of a Parallel.Foreach() downloading urls with WebCLient in a article. The author suggested that in the first example if we have an array of 100 urls - 100 WebClients will be started and most of them will timeout. So he proposed a second implementation where he used thread local state and he stated that "as many WebClient() objects will be spawned as we need".

Question: How the second example ensures that no timeouts will occur? Or in other words how the second example takes in consideration the local limit of connections? Will the clients be reused or something?

Source:

// First example
Parallel.ForEach(urls,
    (url,loopstate,index) =>
    {
        WebClient webclient = new WebClient();
        webclient.DownloadFile(url, filenames[index];
    });

// Second example
Parallel.ForEach(urls,
    () => new WebClient(),
    (url, loopstate, index, webclient) =>
       {
           webclient.DownloadFile(url, filenames[index]);
    },
    (webclient) => { });

Note: Spawning WebClients on multiple threads is only for demo purposes. I know that it will be more effective with async operations.

Link that I got the source from(I simplified it a little): When Should I Use Parallel.ForEach? When Should I Use PLINQ? Look at the "Thread-Local state" chapter.

Upvotes: 4

Views: 1411

Answers (2)

Yuval Itzchakov
Yuval Itzchakov

Reputation: 149626

in other words how the second example takes in consideration the local limit of connections? Will the clients be reused or something?

What the second example does is, instead of creating a WebClient object per iteration, it creates a WebClient instead per thread. This means that if Parallel.ForEach is using 4 threads, it will create 4 instances and will reuse those objects between iterations. Thus, being able to re-use the connection created by each client instead of a new instance which in turn will have to wait on all other clients connection to close.

Eventually, all clients are fighting for the same IO resource that's available via the underlying ServicePointManager.DefaultConnectionLimit. The less connections you have open, the more time you have for each request to finish execution. This can also be resolved by increasing the number of connection limits allowed, which default to 2.

Generally speaking, there's no need to use multiple threads to execute concurrent IO requests. Parallelism doesn't actually help here.

Upvotes: 2

usr
usr

Reputation: 171246

By using thread-local state we now have one WebClient per thread. Not one client per iteration.

The idea of the author is that we now have less WebClient's floating around and consuming resources. That argument is bogus because WebClient instances that are not performing any call at the moment do not hold up any resource. Dispose does nothing on WebClient. Wrap it in using and you are done.

You need to use PLINQ here because Parallel is prone to spawn an unlimited number of threads. With IO you need to control the DOP yourself. Only with PLINQ can you set the exact DOP. The TPL can't know how many concurrent requests your network can sustain.

Upvotes: 2

Related Questions