Dan
Dan

Reputation: 403

What is the purpose of global temporary views?

Trying to understand how to use the Spark Global Temporary Views.

In one spark-shell session I've created a view

spark = SparkSession.builder.appName('spark_sql').getOrCreate()

df = (
spark.read.option("header", "true")
    .option("delimiter", ",")
    .option("inferSchema", "true")
    .csv("/user/root/data/cars.csv"))

df.createGlobalTempView("my_cars")

# works without any problem
spark.sql("SELECT * FROM global_temp.my_cars").show()

And on another I tried to access it, without success (table or view not found).

 #second Spark Shell 
 spark = SparkSession.builder.appName('spark_sql').getOrCreate()
 spark.sql("SELECT * FROM global_temp.my_cars").show()

That's the error I receive :

 pyspark.sql.utils.AnalysisException: u"Table or view not found: `global_temp`.`my_cars`; line 1 pos 14;\n'Project [*]\n+- 'UnresolvedRelation `global_temp`.`my_cars`\n"

I've read that each spark-shell has its own context, and that's why one spark-shell cannot see the other. So I don't understand, what's the usage of the GTV, where will it be useful ?

Thanks

Upvotes: 4

Views: 23779

Answers (2)

Viraj Wadate
Viraj Wadate

Reputation: 6173

Temporary views in Spark SQL are session-scoped and will disappear if the session that creates it terminates. If you want to have a temporary view that is shared among all sessions and keep alive until the Spark application terminates, you can create a global temporary view. Global temporary view is tied to a system preserved database global_temp, and we must use the qualified name to refer it,

df.createGlobalTempView("people")

Upvotes: 1

Avi Chalbani
Avi Chalbani

Reputation: 880

in the spark documentation you can see:

If you want to have a temporary view that is shared among all sessions and keep alive until the Spark application terminates, you can create a global temporary view.

The global table remains accessible as long as the application is alive. Opening a new shell and giving it the same application will just create a new application.

you can try and test it within the same shell:

spark.newSession.sql("SELECT * FROM global_temp.my_cars").show()

please see my answer on a similar question for a more detailed example as well as a short definition of a Spark Application and Spark Session

Upvotes: 8

Related Questions