Reputation: 483
Question :
1) So if i have :
SomeRDD = myRDD.map().partitionBy(new hashPartitioner(2))
How many shuffles these two commands will make? Is that 2 shuffles? 1 for the map(), and 1 for the partitionBy()? or 1 ? It's not yet clear to me...
Upvotes: 0
Views: 235
Reputation: 13001
A map does not cause a shuffle. It is a narrow transformation, i.e. the function (which is missing in your code sample) would be called on each element individually.
A shuffle would be called when you need to access data from different partitions (i.e. a wide transformation). When you do a partitionBy you tell spark that it needs to repartition according to your request and therefore may cause a shuffle.
Upvotes: 3