roblovelock
roblovelock

Reputation: 1981

Is it possible to call a python function from Scala(spark)

I am creating a spark job that requires a column to be added to a dataframe using a function written in python. The rest of the processing is done using Scala.

I have found examples of how to call a Java/Scala function from pyspark:

The only examples I have found to send data the other way is using pipe

Is it possible for me to send the entire dataframe to a python function, have the function manipulate the data and add additional columns and then send the resulting dataframe back to the calling Scala function?

If this isn't possible my current solution is to run a pyspark process and call multiple Scala functions to manipulate the dataframe, this isn't ideal.

Upvotes: 5

Views: 8186

Answers (2)

Egor Kraev
Egor Kraev

Reputation: 519

Just register a UDF from Python, and then from Scala evaluate an SQL statement that uses the function against a DataFrame - works like a charm, just tried it ;) https://github.com/jupyter/docker-stacks/tree/master/all-spark-notebook is a good way to run a notebook in Toree that mixes Scala and Python code calling the same Spark context.

Upvotes: 1

Sami Badawi
Sami Badawi

Reputation: 1032

I found this post:

Machine Learning with Jupyter using Scala, Spark and Python: The Setup

It shows you how to set up a Jupyter notebook that used both Spark and Python. If you are just experimenting with data that might be enough.

Upvotes: 0

Related Questions