user12148436
user12148436

Reputation: 51

Passing sparkSession Between Scala Spark and PySpark

My requirement is to call a "Spark Scala" function from an existing PySpark program. What is the best way to pass sparkSession created in PySpark program to Scala function. I pass my scala jar to Pyspark as follows.

spark-submit --jars ScalaExample-0.1.jar pyspark_call_scala_example.py iris.data

Scalacode

def getDf(spark: SparkSession, query:String, df: DataFrame, log: Logger): DataFrame = {

import spark.implicits._

val df = spark.sql(query)

df

}

Pysparkcode

if __name__ == '__main__':

query = sys.argv[1]

spark = SparkSession \
.builder \
.appName("PySpark using Scala example") \
.getOrCreate()

log4jLogger = sc._jvm.org.apache.log4j 

log = log4jLogger.LogManager.getLogger(__name__) 

query_df = DataFrame(sc._jvm.com.crowdstrike.dsci.sparkjobs.PythonHelper.getDf(???, query, ???), sqlContext)

Question

How to pass sparksession and logger to getDf ?

https://www.crowdstrike.com/blog/spark-hot-potato-passing-dataframes-between-scala-spark-and-pyspark/

Upvotes: 4

Views: 1790

Answers (1)

kellanburket
kellanburket

Reputation: 12833

To pass SparkSession from Python to Scala, use spark._jsparkSession.

Upvotes: 2

Related Questions