thebytewalker
thebytewalker

Reputation: 346

Will Spark cache the data twice if we cache a DataSet and then cache the same DataSet as a table

DataSet<Row> dataSet = sqlContext.sql("some query");
dataSet.registerTempTable("temp_table");
dataset.cache(); // cache 1
sqlContext.cacheTable("temp_table"); // cache 2

So,here my question is will spark cache the dataSet only once or there will be two copies of the same dataSet one as dataSet(cache 1) and other as a table(cache 2)

Upvotes: 1

Views: 1108

Answers (1)

Alper t. Turker
Alper t. Turker

Reputation: 35249

It will not, or at least it won't in any recent version:

scala> val df = spark.range(1)
df: org.apache.spark.sql.Dataset[Long] = [id: bigint]

scala> df.cache
res0: df.type = [id: bigint]

scala> df.createOrReplaceTempView("df")

scala> spark.catalog.cacheTable("df")
2018-01-23 12:33:48 WARN  CacheManager:66 - Asked to cache already cached data.

Upvotes: 1

Related Questions