Reputation: 6342
I know that RDD coalesce can be used to shrink RDD partitions without shuffle
from bigger size to smaller size, for example, from 100 partitions to 10 partitions.
But I don't understand how without shuffle
works, for example, I have 10 executors, and an RDD which has 10 partititons, each executor has 1 partition,If I shrink this RDD to 5 partitions, there must shuffle happen?
Upvotes: 1
Views: 1012
Reputation: 6984
RDD coalesce doesn't do any shuffle
is incorrect it doesn't do full shuffle ,rather minimize the data movement across the nodes.
So it will do some shuffle but not the full shuffle which the repartition will do. With coalesce you can only decrease the number of partitions. If you want to increase it does the repartition itself.
There is a nice blog explaining the issue
Upvotes: 1