Reputation: 51
What is the use of getOrCreate()
method in SparkContext Class
and how I can use it? I did not found any suitable example(coding wise) for this.
What I understand is that using above method I can share spark context between applications. What do we mean by applications here? Is application a different job submitted to a spark cluster? If so then we should be able to use global variables(broadcast) and temp tables registered in one application into another application ?
Please if anyone can elaborate and give suitable example on this.
Upvotes: 3
Views: 19129
Reputation: 153
getOrCreate
public SparkSession getOrCreate()
Gets an existing SparkSession or, if there is no existing one, creates a new one based on the options set in this builder. This method first checks whether there is a valid thread-local SparkSession and if yes, return that one. It then checks whether there is a valid global default SparkSession and if yes, return that one. If no valid global default SparkSession exists, the method creates a new SparkSession and assigns the newly created SparkSession as the global default.
In case an existing SparkSession is returned, the config options specified in this builder will be applied to the existing SparkSession.
Please check link: [https://spark.apache.org/docs/2.0.1/api/java/org/apache/spark/sql/SparkSession.Builder.html][1]
An example can be :
from pyspark.sql import SparkSession
spark = SparkSession \
.builder \
.appName("Python Spark SQL basic example") \
.config("spark.some.config.option", "some-value") \
.getOrCreate()
Upvotes: 1
Reputation: 1801
As given in the Javadoc for SparkContext, getOrCreate() is useful when applications may wish to share a SparkContext. So yes, you can use it to share a SparkContext object across Applications. And yes, you can re-use broadcast variables and temp tables across.
As for understanding Spark Applications, please refer this link. In short, an application is the highest-level unit of computation in Spark. And what you submit to a spark cluster is not a job, but an application. Invoking an action inside a Spark application triggers the launch of a job to fulfill it.
Upvotes: 4