Reputation: 195
The difference between coalesce
and repartition
is fairly straightforward. If I were to coalesce a DataFrame to 1 partition and write it to a storage service (Azure Blob/ AWS S3 etc), would the entire DataFrame be sent to the driver and then to the storage service; or would an executor send it directly?
Upvotes: 1
Views: 907
Reputation: 4199
The Spark official documentation describes it as follows:
If you’re doing a drastic coalesce, e.g. to numPartitions = 1, this may result in your computation taking place on fewer nodes than you like (e.g. one node in the case of numPartitions = 1).
From the above it can be inferred that it should be an executor send it directly.
Upvotes: 3