Ged
Ged

Reputation: 18098

SPARK TempView Performance

I found this quote on SO 44011846:

Unlike a traditional temp table, a temp view is NOT materialized at all even to memory. It's useful for accessing data in SQL but understand that its statements have to be evaluated EVERY time it's accessed –

Does this mean then that it will go all the way back to Hive if Hive is the source for the TempView?

I think so, and that therefore that a hign number of repeated accesses are an issue in terms of performance.

Upvotes: 1

Views: 4003

Answers (1)

Kishore
Kishore

Reputation: 5891

Spark 1.6

Temp table/View are not stored in inmemory. They are only useful for accessing data from hive/rdbms.

If you are using Hive-

hiveContext.select("select * from tableA").registerTempTable("tableA")

Above statement only register the temp table. It is transformation. When any action performed, it execute the sql on hive and create a temp table. It will execute every time whenever any action performed.

If you want to cache the table in memory then you have to use below statement-

 hiveContext.cacheTable("tableA")

It is lazy evaluation. Whenever any action performed it will execute the sql and save the temp table in inmemory. Next time, action performed on inmemory table. It will not evaluated every time.

Spark2.0

registerTempTable replaced with createOrReplaceTempView

Upvotes: 2

Related Questions