Paam
Paam

Reputation: 141

passing python modules on HDFS through livy

On the /user/usr1/ path in HDFS, I placed two scripts pySparkScript.py and relatedModule.py. relatedModule.py is a python module which will be imported into pySparkScript.py.

I can run the scripts with spark-submit pySparkScript.py

However, I need to run these scripts through Livy. Normally, I run single scripts successfully as the following:

curl -H "Content-Type:application/json" -X POST -d '{"file": "/user/usr1/pySparkScript.py"}' livyNodeAddress/batches

However, when I run the above code, as soon as it gets to import relatedModule.py it fails. I realize I should give the path to the relatedModule also in the parameters of Livy. I tried the following option:

curl -H "Content-Type:application/json" -X POST -d '{"file": "/user/usr1/pySparkScript.py", "files": ["/user/usr1/relatedModule.py"]}' livyNodeAddress/batches

How should I pass both files to Livy?

Upvotes: 0

Views: 885

Answers (1)

Alex Sasnouskikh
Alex Sasnouskikh

Reputation: 991

Try to use pyFiles property. Please refer Livy REST API docs.

Upvotes: 1

Related Questions