Reputation: 51
My requirement is to call a "Spark Scala" function from an existing PySpark program. What is the best way to pass sparkSession created in PySpark program to Scala function. I pass my scala jar to Pyspark as follows.
spark-submit --jars ScalaExample-0.1.jar pyspark_call_scala_example.py iris.data
def getDf(spark: SparkSession, query:String, df: DataFrame, log: Logger): DataFrame = {
import spark.implicits._
val df = spark.sql(query)
df
}
if __name__ == '__main__':
query = sys.argv[1]
spark = SparkSession \
.builder \
.appName("PySpark using Scala example") \
.getOrCreate()
log4jLogger = sc._jvm.org.apache.log4j
log = log4jLogger.LogManager.getLogger(__name__)
query_df = DataFrame(sc._jvm.com.crowdstrike.dsci.sparkjobs.PythonHelper.getDf(???, query, ???), sqlContext)
How to pass sparksession and logger to getDf ?
Upvotes: 4
Views: 1790
Reputation: 12833
To pass SparkSession from Python to Scala, use spark._jsparkSession
.
Upvotes: 2