Reputation: 6152
I have some DB operations to perform and I tried using PLINQ:
someCollection.AsParallel()
.WithCancellation(token)
.ForAll(element => ExecuteDbOperation(element))
And I notice it is quite slow compared to:
var tasks = someCollection.Select(element =>
Task.Run(() => ExecuteDbOperation(element), token))
.ToList()
await Task.WhenAll(tasks)
I prefer the PLINQ syntax, but I am forced to use the second version for performances.
Can someone explain the big difference in performances?
Upvotes: 4
Views: 2583
Reputation: 244777
Both PLINQ and Parallel.ForEach()
were primarily designed to deal with CPU-bound workloads, which is why they don't work so well for your IO-bound work. For some specific IO-bound work, there is an optimal degree of parallelism, but it doesn't depend on the number of CPU cores, while the degree of parallelism in PLINQ and Parallel.ForEach()
does depend on the number of CPU cores, to a greater or lesser degree.
Specifically, the way PLINQ works is to use a fixed number of Task
s, by default based on the number of CPU cores on your computer. This is meant to work well for a chain of PLINQ methods. But it seems this number is smaller than the ideal degree of parallelism for your work.
On the other hand Parallel.ForEach()
delegates deciding how many Task
s to run to the ThreadPool
. And as long as its threads are blocked, ThreadPool
slowly keeps adding them. The result is that, over time, Parallel.ForEach()
might get closer to the ideal degree of parallelism.
The right solution is to figure out what the right degree of parallelism for your work is by measuring, and then using that.
Ideally, you would make your code asynchronous and then use some approach to limit the degree of parallelism fro async
code.
Since you said you can't do that (yet), I think a decent solution might be to avoid the ThreadPool
and run your work on dedicated threads (you can create those by using Task.Factory.StartNew()
with TaskCreationOptions.LongRunning
).
If you're okay with sticking to the ThreadPool
, another solution would be to use PLINQ ForAll()
, but also call WithDegreeOfParallelism()
.
Upvotes: 4
Reputation: 1120
My supposition that this is because of the number of threads created.
In the first example this number will be roughly equal to the number of cores of your computer. By contrast, the second example will create as many threads as someCollection
has elements. For IO operation that's generally more efficient.
The Microsoft guide "Patterns_of_Parallel_Programming_CSharp" recommends for IO operation to create more threads than default (p. 33):
var addrs = new[] { addr1, addr2, ..., addrN };
var pings = from addr in addrs.AsParallel().WithDegreeOfParallelism(16)
select new Ping().Send(addr);
Upvotes: 4
Reputation: 981
I belive if you get let say more then 10000 elements it will be better to use PLINQ because it won't create task for each element of your collection because it uses a Partitioner inside it. Each task creation has some overhead data initialization inside it. Partitioner will create only as many tasks that are optimized for currently avaliable cores, so it will re-use this tasks with new data to process. You can read more about it here: http://blogs.msdn.com/b/pfxteam/archive/2009/05/28/9648672.aspx
Upvotes: 3