zjw
zjw

Reputation: 51

Spark cache disk VS cache memory

enter image description here I have a question about the follow picture.I use data.persist(StorageLevel.MEMORY_AND_DISK_SER) to cache our original data,but what is so surprised is that the speed we cached in memory is the same as the speed we cached in disk?why?I feel the speed we cached in memory should be faster than the speed we cached in disk,who can help me with this problem?

Upvotes: 2

Views: 2720

Answers (2)

xiaoxinganling
xiaoxinganling

Reputation: 51

Maybe you should try the StorageLevel DISK_ONLY VS MEMORY_ONLYand increase your input data size.

Upvotes: 0

Rami
Rami

Reputation: 8314

If I am not wrong, this is because Spark is not writing directly to disk.

For MEMORY_AND_DISK_SER persistence level, the RDD that could fit into memory would be left there (same as MEMORY_ONLY), and only if it was too big for memory would it spill to disk.

So I presume you do not have problem there, it is normal that you will see these times, until your memory is full then you will start to see longer time to write the data to disk.

Upvotes: 2

Related Questions