Reputation: 151
Lets say I have a python file my_python.py in which I have created a SparkSession 'spark' . I have a jar say my_jar.jar in which some spark logic is written. I am not creating SparkSession in my jar , rather I want to use the same session created in my_python.py. How to write a spark-submit
command which take my python file , my jar and my sparksession 'spark' as an argument to my jar file.
Is it possible ? If not , please share the alternative to do so.
Upvotes: 0
Views: 2734
Reputation: 96
So I feel there are two questions -
Q1. How in scala file you can reuse already created spark session?
Ans: Inside your scala code, you should use builder to get an existing session:
SparkSession.builder().getOrCreate()
Please check the Spark doc https://spark.apache.org/docs/2.3.0/api/java/org/apache/spark/sql/SparkSession.html
Q2: How you do spark-submit with a .py file as driver and scala jar(s) as supporting jars?
And: It should be in something like this
./spark-submit --jars myjar.jar,otherjar.jar --py-files path/to/myegg.egg path/to/my_python.py arg1 arg2 arg3
So if you notice the method name, it is getOrCreate() - that means if a spark session is already created, no new session will be created rather existing session will be used. Check this link for full implementation example: https://www.crowdstrike.com/blog/spark-hot-potato-passing-dataframes-between-scala-spark-and-pyspark/
Upvotes: 1