Spark: How can we remove partitioner from RDD?

Question

I am grouping a RDD based on a key.

rdd.groupBy(_.key).partitioner
=> org.apache.spark.HashPartitioner@a

I see that by default Spark, associates HashPartitioner with this RDD, which is fine by me because I agree that we need some kind of partitioner to bring alike data to one executor. But, later in the program I want the RDD to forget about its partitioner strategy because I want to join it with another RDD which follows different partitioning strategy. How can we remove the partitioner from the RDD?

Spark: How can we remove partitioner from RDD?

Answers (0)

Related Questions