Spark session/context lifecycle

Question

I don't understand how the spark session/context lifecycle works. The documentation says that you can have multiple SparkSessions that share an underlying SparkContext. But how/when are those created and destroyed? For example, if I have a production cluster and I spark-submit 10 ETLs, will these 10 jobs share the same SparkContext? Does it matter if I do this in cluster/client mode? To the best of my understanding, the SparkContext lives in the driver so I assume the above would result in one SparkContext shared by 10 SparkSessions, but I'm not at all sure I got this correctly... Any clarification will be much appreciated.

Nayan Sharma · Accepted Answer

Let's understand sparkSession and sparkContext

SparkContext is a channel to access all Spark functionality.The Spark driver program uses it to connect to the cluster manager to communicate, submit Spark jobs and knows what resource manager (YARN) to communicate to.And through SparkContext, the driver can access other contexts such as SQLContext, HiveContext, and StreamingContext to program Spark.

With Spark 2.0, SparkSession can access all aforementioned Spark’s functionality through a single-unified point of entry.

It means SparkSession Encapsulates SparkContext.

Let’s say we have multiple users accessing the same notebook which had shared sparkContext and the requirement was to have an isolated environment sharing the same spark context. Prior to 2.0, the solution to this was to create multiple sparkContexts ie sparkContext per isolated environment or users and is an expensive operation(a single sparkContext exists per JVM). But with the introduction of the spark session, this issue has been addressed.

I spark-submit 10 ETLs, will these 10 jobs share the same SparkContext? Does it matter if I do this in cluster/client mode? To the best of my understanding, the SparkContext lives in the driver so I assume the above would result in one SparkContext shared by 10 SparkSessions,

If you submit 10 ETL spark-submit jobs whether it is cluster/client they all are different application and they have their own sparkContext and sparkSession.In native spark you can't share objects between different application but if you want share objects you have to use share contexts(spark-jobserver).Multiple option are available like Apache Ivy, apache-ignite

Spark session/context lifecycle

Answers (2)

Related Questions