Tom
Tom

Reputation: 6342

How does RDD coalesce work

I know that RDD coalesce can be used to shrink RDD partitions without shuffle from bigger size to smaller size, for example, from 100 partitions to 10 partitions.

But I don't understand how without shuffle works, for example, I have 10 executors, and an RDD which has 10 partititons, each executor has 1 partition,If I shrink this RDD to 5 partitions, there must shuffle happen?

Upvotes: 1

Views: 1012

Answers (1)

Avishek Bhattacharya
Avishek Bhattacharya

Reputation: 6984

RDD coalesce doesn't do any shuffle is incorrect it doesn't do full shuffle ,rather minimize the data movement across the nodes.

So it will do some shuffle but not the full shuffle which the repartition will do. With coalesce you can only decrease the number of partitions. If you want to increase it does the repartition itself.

There is a nice blog explaining the issue

Upvotes: 1

Related Questions