Reputation: 3758
Let's say I have a somewhat large (several millions of items, or so) list of strings. Is it a good idea to run something like this:
val updatedList = myList.par.map(someAction).toList
Or would it be a better idea to group the list before running ...par.map(
, like this:
val numberOfCores = Runtime.getRuntime.availableProcessors
val updatedList =
myList.grouped(numberOfCores).toList.par.map(_.map(someAction)).toList.flatten
UPDATE:
Given that someAction
is quite expensive (comparing to grouped
, toList
, etc.)
Upvotes: 12
Views: 1794
Reputation: 32335
As suggested, avoid using lists and par
, since that entails copying the list into a collection that can be easily traversed in parallel. See the Parallel Collections Overview for an explanation.
As described in the section on concrete parallel collection classes, a ParVector
may be less efficient for the map
operation than a ParArray
, so if you're really concerned about performance, it may make sense to use a parallel array.
But, if someAction
is expensive enough, then its computational cost will hide the sequential bottlenecks in toList
and par
.
Upvotes: 8
Reputation: 297155
Run par.map
directly, as it already takes the number of cores into account. However, do not keep a List
, as that requires a full copy to make into a parallel collection. Instead, use Vector
.
Upvotes: 14