Reputation: 109
I need to perform a reduceByKey
on lists. What would be the fastest solution ? I'm using the :::
operator to merge 2 list in the reduce operation, but :::
is O(n) so I am afraid the reduce operation will end up being O(n2).
Code example :
val rdd: RDD[int, List[int]] = getMyRDD()
rdd.reduceByKey(_ ::: _)
What would be the best/most efficient solution ?
Upvotes: 1
Views: 561
Reputation: 330073
The best you can do is:
rdd.groupByKey.mapValues(_.flatten.toList)
This will:
If you want map-side reduction you can use aggregateByKey
:
import scala.collection.mutable.ArrayBuffer
rdd.aggregateByKey(ArrayBuffer[Int]())(_ ++= _, _ ++= _).mapValues(_.toList)
but usually it will be significantly more expensive compared to the first solution.
Upvotes: 2