Reputation: 28423
At first I assumed that every collection class would receive an additional par
method which would convert the collection to a fitting parallel data structure (like map
returns the best collection for the element type in Scala 2.8).
Now it seems that some collection classes support a par
method (e. g. Array) but others have toParSeq
, toParIterable
methods (e. g. List). This is a bit weird, since Array isn't used or recommended that often.
What is the reason for that? Wouldn't it be better to just have a par
available on all collection classes doing the "right thing"?
If I have data which might be processed in parallel, what types should I use? The traits in scala.collection
or the type of the implementation directly?
Or should I prefer Arrays
now, because they seem to be cheaper to parallelize?
Upvotes: 5
Views: 736
Reputation: 167871
Lists aren't that well suited for parallel processing. The reason is that to get to the end of the list, you have to walk through every single element. Thus, you may as well just treat the list as an iterator, and thus may as well just use something more generic like toParIterable
.
Any collection that has a fast index is a good candidate for parallel processing. This includes anything implementing LinearSeqOptimized
, plus trees and hash tables. Array
has as fast of an index as you can get, so it's a fairly natural choice. You can also use things like ArrayBuffer
(which has a par
method returning a ParArray
).
Upvotes: 5