Beefger
Beefger

Reputation: 23

How does Spark perform I/O?

It is my understanding that Spark uses parallel IO to read files. That conclusion comes from other stack overflow responses.

My question is does spark read data using an independent approach or a collective approach? In other words, does each worker read a set chunk of data, or do the workers communicate with each other and collaborate to efficiently read data?

Upvotes: 1

Views: 991

Answers (2)

A Khe
A Khe

Reputation: 73

The workers communicate by the driver And each worker process its own data

Upvotes: 1

Yugerten
Yugerten

Reputation: 898

Each Apache Spark workers has Executors, Workers can be deployed as distributed or standalone mode.
Each Worker process its own data that it processes. For more detail see this answer or this link

Upvotes: 1

Related Questions