partition ordering aggregateByKey Spark

Question

So if i have a transformation before :

myRDD = someRDD.map()

mySecondRDD = myRDD.aggregateByKey(initValue)(CombOp , MergeOp)

In this point myRDD doesn't have a partitioner, but mySecondRDD has one hashPartitioner. Firstly i want to ask:

1)Do i have to designate a partitioner in myRDD? And If i do how is it possible to pass it as an argument in aggregateByKey?

*Note that myRDD is a transformation and hasn't a partitioner

2)Shouldn't at the end of these two commands myRDD have the same partitioner as mySecondRDD instead of none?

3) How many shuffles these 2 commands will do?

4)If i designate a partitioner with partitionBy in myRDD, and manage to pass it as an argument in aggregateByKey will i have reduced the shuffles to 1 instead of 2?

I am sorry i still don't quite get it how it works.

partition ordering aggregateByKey Spark

Answers (1)

Related Questions