Dmitry Petrov
Dmitry Petrov

Reputation: 1547

How to check if Spark RDD is in memory?

I have an instance of org.apache.spark.rdd.RDD[MyClass]. How can I programmatically check if the instance is persist\inmemory?

Upvotes: 9

Views: 3387

Answers (2)

KayV
KayV

Reputation: 13835

You can call rdd.getStorageLevel.useMemory to check if it is in memory or not as follows:

scala> myrdd.getStorageLevel.useMemory
res3: Boolean = false

scala> myrdd.cache()
res4: myrdd.type = MapPartitionsRDD[2] at filter at <console>:29

scala> myrdd.getStorageLevel.useMemory
res5: Boolean = true

Upvotes: 2

Justin Pihony
Justin Pihony

Reputation: 67065

You want RDD.getStorageLevel. It will return StorageLevel.None if empty. However that is only if it is marked for caching or not. If you want the actual status you can use the developer api sc.getRDDStorageInfo or sc.getPersistentRDD

Upvotes: 12

Related Questions