Vineet Salvi
Vineet Salvi

Reputation: 13

How Spark will Store a file of 500GB/1TB Data

I am new to Spark and I read that Spark stores the data in memory.

Now suppose, I have a machine with 256GB RAM and 72TB Hard Disk. I want to know, if I load a single file of 500GB/1TB then where will it store the data.

Query:

Will it store the data in Disk?

Will it store part data in memory and other half in Disk?

Thanks in advance

Upvotes: 1

Views: 2144

Answers (2)

Akash Sethi
Akash Sethi

Reputation: 2294

Firstly until and unless use are not using some action there will be no effect on the file as Spark follows lazy evaluation approach.

When you specify the action then spark able to process the file

Spark will create several partitions of file then start processing each partition in memory based on transformation and action

Now suppose partition size is more that the current avalaible memory then spark will try to put as much data or chunk of file in memory and rest put on disk and then process accordingly.

I hope this clears your query.

Upvotes: 3

Ani Menon
Ani Menon

Reputation: 28239

The data is stored on the Disk. Only while processing it pulls the data into the memory.

Upvotes: 0

Related Questions