Reputation: 632
I generated an .egg
file. Now I want to run my Spark application using spark-submit
command on my local Windows. I have Spark version 2.1.1
spark-submit --py-files local:///C:/git_local/sparkETL/dist/sparkETL-0.1-py3.6.egg driver.py
spark-submit --py-files local:///C:/git_local/sparkETL/dist/sparkETL-0.1-py3.6.egg driver.py
This is code I'm trying but I'm getting error:
File not found(c:\spark\bin\driver.py)
Why spark-submit
is trying to find file on local path when I already packaged it inside .egg
? I read .egg
files are similar to jar
, so I assume like in case of jar
file we pass class name to run spark-submit
. Now I'm passing driver.py which is main file but it is not working.
Upvotes: 1
Views: 2206
Reputation: 1015
spark-submit in this case pyspark always requires a python file to run (specifically driver.py), py-files are only libraries you want to attach to your spark job and are possibly used inside driver.py.
If you want to make it works, make sure driver.py exists in current location which you trigger spark-submit. Or change it to something like local:///C:/git_local/sparkETL/driver.py
Upvotes: 1