Fengyu
Fengyu

Reputation: 35

How to call python script in Spark?

I have a metrics.py which calculates a graph.

I can call it in the terminal command line (python ./metrics.py -i [input] [output]).

I want to write a function in Spark. It calls the metrics.py script to run on the provide file path and collects the values that metrics.py prints out.

How can I do that?

Upvotes: 1

Views: 7097

Answers (1)

Shantanu Alshi
Shantanu Alshi

Reputation: 528

In order to run metrics.py, you essentially ship it to all the executor nodes that run your Spark Job.

To do this, you either pass it via SparkContext -

sc = SparkContext(conf=conf, pyFiles=['path_to_metrics.py'])

or pass it later using the Spark Context's addPyFile method -

sc.addPyFile('path_to_metrics.py')

In either case, after that, do not forget to import metrics.py and then just call needed function that gives needed output.

import metrics
metrics.relevant_function()

Also make sure you have all the python libraries that are imported inside metrics.py installed on all executor nodes. Else, take care of them using the --py-files and --jars handles while spark-submitting your job.

Upvotes: 4

Related Questions