fuzzy-memory
fuzzy-memory

Reputation: 195

Does coalesce(1) bring all the data to the driver?

The difference between coalesce and repartition is fairly straightforward. If I were to coalesce a DataFrame to 1 partition and write it to a storage service (Azure Blob/ AWS S3 etc), would the entire DataFrame be sent to the driver and then to the storage service; or would an executor send it directly?

Upvotes: 1

Views: 907

Answers (1)

过过招
过过招

Reputation: 4199

The Spark official documentation describes it as follows:

If you’re doing a drastic coalesce, e.g. to numPartitions = 1, this may result in your computation taking place on fewer nodes than you like (e.g. one node in the case of numPartitions = 1).

From the above it can be inferred that it should be an executor send it directly.

Upvotes: 3

Related Questions