Reputation: 51
I have a question about the follow picture.I use data.persist(StorageLevel.MEMORY_AND_DISK_SER)
to cache our original data,but what is so surprised is that the speed we cached in memory is the same as the speed we cached in disk?why?I feel the speed we cached in memory should be faster than the speed we cached in disk,who can help me with this problem?
Upvotes: 2
Views: 2720
Reputation: 51
Maybe you should try the StorageLevel DISK_ONLY
VS MEMORY_ONLY
and increase your input data size.
Upvotes: 0
Reputation: 8314
If I am not wrong, this is because Spark is not writing directly to disk.
For MEMORY_AND_DISK_SER
persistence level, the RDD that could fit into memory would be left there (same as MEMORY_ONLY), and only if it was too big for memory would it spill to disk.
So I presume you do not have problem there, it is normal that you will see these times, until your memory is full then you will start to see longer time to write the data to disk.
Upvotes: 2