Mahesh J
Mahesh J

Reputation: 124

Pyspark UDF in Java Spark Program

Is there any way by which I can use UDF's created in pyspark into Java Spark job

I know there is a way to use Java UDF into pyspark, but I am looking for other way round

Upvotes: 0

Views: 283

Answers (1)

ShemTov
ShemTov

Reputation: 707

First, I have to say that I don’t recommend you to do that. It sounds like a huge latency for the UDF, and I really suggest you to try write the UDF in Scala / Java.

If you still want to do that, here is how: you should write a UDF that creates a Python interpreter and executes your code. Here is a Scala code example:

System.setProperty("python.import.site", "false")
val interpreter = new PythonInterpreter
interpreter.exec("from __builtin__ import *")
// execute a function that takes a string and returns its length
val someFunc = interpreter.get("len")
val result = someFunc.__call__(new PyString("Test!"))
val realResult = result.__tojava__(classOf[Integer]).asInstanceOf[Int]
print(realResult)

This code call the len Python function and returns its result on the string "Test!".

I really think it’ll cause a bad performance for your job, and you should reconsider this plan again.

Upvotes: 1

Related Questions