Reputation: 700
I have an input A which I convert into an rdd X spread across the cluster.
I perform certain operations on it.
Then I do .repartition(1)
on the output rdd.
Will my output rdd be in the same order that input A.
Does spark handle this automatically? If yes, then how?
Upvotes: 0
Views: 77
Reputation: 170713
The documentation doesn't guarantee that order will be kept, so you can assume it won't be. If you look at the implementation, you'll see it certainly won't be (unless your original RDD already has 1 partition for some reason): repartition
calls coalesce(shuffle = true)
, which
Distributes elements evenly across output partitions, starting from a random partition.
Upvotes: 1