UserControl
UserControl

Reputation: 15159

Parallel I/O using TPL

Say there is a list of document IDs and i want to retrieve the documents from a web service. I'm a newbie with TPL and interested with some best practices i failed to google.

Am i correct that PLINQ's AsParallel() is not suitable here as it will partition the source ID list thus retrieving documents in a single partition one by one?

Should i use LINQ's Select() method to convert the list to Task<Document> list and then WaitAll() on it?

Parallel class and AsParallel() extension method both use Task<T> underneath, don't they? Is it possible to pass local state into the delegates just like i pass it to Task(Action<Object>, Object) overload?

Upvotes: 1

Views: 486

Answers (2)

Tony Hopkinson
Tony Hopkinson

Reputation: 20320

Not sure that's a good target for parallelisation, the bottleneck is going to be the client network connection which is common. Can't say from here but unless you have a lot of unused capacity (risks hogging the network) or there's some reason a request for one document might block so you can work on another, don't think you are going to get a lot out of this.

Parallelisation by web service, that would be a goer.

Upvotes: 1

usr
usr

Reputation: 171178

Using AsParallel for IO is dangerous because you cannot precisely control the degree of parallelism (DOP). Your IO device will have a certain optimal DOP but this will be different from what TPL will use.

Also, when calling network functions, I have seen TPL use much more threads than the number of processors. This leads to oversaturation of the network and suboptimal throughput. It can also lead to timeouts. I would not put such a thing into production because of its fragile nature.

The algorithm that TPL uses to choose the number of threads is not entirely clear to me. I think it tries to detect if adding more threads than there are CPUs increases throughput. But it will IMHO never use less than the number of CPUs. Imaging 64 threads hammering your web-service.

If you need a precise degree of parallelism I suggest you create the wanted amount of Tasks/Threads yourself. You can put this code into a reusable helper function ("ParallelForeachWithExactDOP").

My recommendation: If you just want to run everything you have in parallel, thereby risking oversaturation and timeouts, you can indeed just use Select to spawn all tasks at once. You should only do this if you know that the number of tasks will be in a sane range (say, there are at most 10 documents).

Here is a trick that you could also use: split your documents into chunks of 10. Then, foreach chunk, you spawn all tasks at once and wait for all of them to complete. This way you have only 10 tasks in flight at once. This method is fairly simple. But it will provide suboptimal throughput because most of the time there are less than 10 tasks running and sometimes even none. Consider this to be a simple beginners technique.

Upvotes: 2

Related Questions