Sam
Sam

Reputation: 323

Get instance of Azure data bricks Spark in Python code

I am developing a python package which will be deployed into databricks cluster. We often need reference to the "spark" and "dbutils" object within the python code.

We can access these objects easily within Notebook using "spark" (like spark.sql()). How do we get the spark instance within the python code in the package?

Upvotes: 1

Views: 803

Answers (1)

user11244904
user11244904

Reputation: 26

SparkSession.Builder.getOrCreate:

Gets an existing SparkSession or, if there is no existing one, creates a new one based on the options set in this builder.

This method first checks whether there is a valid global default SparkSession, and if yes, return that one. If no valid global default SparkSession exists, the method creates a new SparkSession and assigns the newly created SparkSession as the global default

So whenever you need instance of SparkSession and don't want to pass it as an argument:

from pyspark.sql import SparkSession

spark = SparkSession.builder.getOrCreate()

Upvotes: 1

Related Questions