Sharing a spark session

Question

Lets say I have a python file my_python.py in which I have created a SparkSession 'spark' . I have a jar say my_jar.jar in which some spark logic is written. I am not creating SparkSession in my jar , rather I want to use the same session created in my_python.py. How to write a spark-submit command which take my python file , my jar and my sparksession 'spark' as an argument to my jar file.

Is it possible ? If not , please share the alternative to do so.

Soumen Chandra · Accepted Answer

So I feel there are two questions -

Q1. How in scala file you can reuse already created spark session?

Ans: Inside your scala code, you should use builder to get an existing session:

SparkSession.builder().getOrCreate()

Please check the Spark doc https://spark.apache.org/docs/2.3.0/api/java/org/apache/spark/sql/SparkSession.html

Q2: How you do spark-submit with a .py file as driver and scala jar(s) as supporting jars?

And: It should be in something like this

./spark-submit --jars myjar.jar,otherjar.jar --py-files path/to/myegg.egg path/to/my_python.py arg1 arg2 arg3

So if you notice the method name, it is getOrCreate() - that means if a spark session is already created, no new session will be created rather existing session will be used. Check this link for full implementation example: https://www.crowdstrike.com/blog/spark-hot-potato-passing-dataframes-between-scala-spark-and-pyspark/

Sharing a spark session

Answers (1)

Related Questions