Reputation: 13
I am new to Spark and I read that Spark stores the data in memory.
Now suppose, I have a machine with 256GB RAM and 72TB Hard Disk. I want to know, if I load a single file of 500GB/1TB then where will it store the data.
Query:
Will it store the data in Disk?
Will it store part data in memory and other half in Disk?
Thanks in advance
Upvotes: 1
Views: 2144
Reputation: 2294
Firstly until and unless use are not using some action there will be no effect on the file as Spark follows lazy evaluation approach.
When you specify the action then spark able to process the file
Spark will create several partitions of file then start processing each partition in memory based on transformation and action
Now suppose partition size is more that the current avalaible memory then spark will try to put as much data or chunk of file in memory and rest put on disk and then process accordingly.
I hope this clears your query.
Upvotes: 3
Reputation: 28239
The data is stored on the Disk. Only while processing it pulls the data into the memory.
Upvotes: 0