Ashish
Ashish

Reputation: 21

Why "collect" action in spark triggers data collection to driver?

When we use show or take or write actions in spark will all the data be sent to driver? If not, then why when we use collect does all the data go to driver?

Upvotes: 2

Views: 1215

Answers (1)

mck
mck

Reputation: 42352

show and take fetches the amount of data that you requested (e.g. 20 rows) onto the driver, while collect fetches the data in the whole dataframe, across all partitions, onto the driver. write will output the whole dataframe to a file location, but it's generally done in a partitioned manner, meaning that each executor can directly write the data contained in its partition to the file system.

Upvotes: 2

Related Questions