Reputation: 35
I have a metrics.py
which calculates a graph.
I can call it in the terminal command line (python ./metrics.py -i [input] [output]
).
I want to write a function in Spark. It calls the metrics.py
script to run on the provide file path and collects the values that metrics.py
prints out.
How can I do that?
Upvotes: 1
Views: 7097
Reputation: 528
In order to run metrics.py, you essentially ship it to all the executor nodes that run your Spark Job.
To do this, you either pass it via SparkContext -
sc = SparkContext(conf=conf, pyFiles=['path_to_metrics.py'])
or pass it later using the Spark Context's addPyFile method -
sc.addPyFile('path_to_metrics.py')
In either case, after that, do not forget to import metrics.py and then just call needed function that gives needed output.
import metrics
metrics.relevant_function()
Also make sure you have all the python libraries that are imported inside metrics.py installed on all executor nodes. Else, take care of them using the --py-files and --jars handles while spark-submitting your job.
Upvotes: 4