How can Spark process data that is way larger than Spark storage?

Question

Currently taking a course in Spark and came across the definition of an executor:

Each executor will hold a chunk of the data to be processed. This chunk is called a Spark partition. It is a collection of rows that sits on one physical machine in the cluster. Executors are responsible for carrying out the work assigned by the driver. Each executor is responsible for two things: (1) execute code assigned by the driver, (2) report the state of the computation back to the driver

I am wondering what will happen if the storage of the spark cluster is less than the data that needs to be processed? How executors will fetch the data to sit on the physical machine in the cluster?

The same question goes for streaming data, which unbound data. Do Spark save all the incoming data on disk?

How can Spark process data that is way larger than Spark storage?

Answers (1)

Related Questions