anvy elizabeth
anvy elizabeth

Reputation: 130

Pyspark Dataframe Default storagelevel for persist and cache()

P is a dataframe. I observed below behaviour in storagelevel:

P.cache()
P.storageLevel
StorageLevel(True, True, False, True, 1)
P.unpersist()
P.StorageLevel
StorageLevel(False, False, False, False, 1)
P.persist()
StorageLevel(True, True, False, True, 1)

This shows default for persist and cache is MEM_DISk BuT I have read in docs that Default for cache is MEM_ONLY Pleasehelp me in understanding.

Upvotes: 0

Views: 485

Answers (1)

David Vrba
David Vrba

Reputation: 3344

From PySpark documentation:

Note The default storage level has changed to MEMORY_AND_DISK to match Scala in 2.0.

As you can find here: Latest PySpark docs

Upvotes: 1

Related Questions