Spar
Spar

Reputation: 483

map().partitionBy() How many shuffles these 2 commands will do?

Question :

1) So if i have :

SomeRDD = myRDD.map().partitionBy(new hashPartitioner(2))

How many shuffles these two commands will make? Is that 2 shuffles? 1 for the map(), and 1 for the partitionBy()? or 1 ? It's not yet clear to me...

Upvotes: 0

Views: 235

Answers (1)

Assaf Mendelson
Assaf Mendelson

Reputation: 13001

A map does not cause a shuffle. It is a narrow transformation, i.e. the function (which is missing in your code sample) would be called on each element individually.

A shuffle would be called when you need to access data from different partitions (i.e. a wide transformation). When you do a partitionBy you tell spark that it needs to repartition according to your request and therefore may cause a shuffle.

Upvotes: 3

Related Questions