Reputation: 323
I am developing a python package which will be deployed into databricks cluster. We often need reference to the "spark" and "dbutils" object within the python code.
We can access these objects easily within Notebook using "spark" (like spark.sql()). How do we get the spark instance within the python code in the package?
Upvotes: 1
Views: 803
Reputation: 26
SparkSession.Builder.getOrCreate
:
Gets an existing SparkSession or, if there is no existing one, creates a new one based on the options set in this builder.
This method first checks whether there is a valid global default SparkSession, and if yes, return that one. If no valid global default SparkSession exists, the method creates a new SparkSession and assigns the newly created SparkSession as the global default
So whenever you need instance of SparkSession
and don't want to pass it as an argument:
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
Upvotes: 1