Reputation: 25
I'm submitting a python file which depends on custom modules to run. The file I'm trying to submit is located at project/main.py and our modules are located at project/modules/module1.py. I'm submitting to Yarn in client mode and receiving the following error.
ModuleNotFoundError: No module named 'modules.module1'
The import statement in main.py:
from modules import module1.py
I have tried zipping the modules folder and passing it to --py-files:
spark-submit --master yarn --queue OurQueue --py-files hdfs://HOST/path/to/modules.zip
--conf "spark.pyspark.driver.python=/hadoop/anaconda3.6/bin/python3"
--conf "spark.pyspark.python=/hadoop/anaconda3.6/bin/python3"
main.py
Upvotes: 2
Views: 5705
Reputation: 20445
Assuming you have a zip file made as
zip -r modules
I think that you are missing to attach this file to spark context, you can use addPyFile() function in the script as
sc.addPyFile("modules.zip")
Also, Dont forget to make make empty __init__.py
file at root level in your directory(modules.zip) like modules/__init__.py
)
Now to Import, I think you can import it as
from modules.module1 import *
or
from modules.module1 import module1
Updated, Now run the spark-submit command as
spark-submit --master yarn --queue OurQueue --py-files modules.zip
--conf "spark.pyspark.driver.python=/hadoop/anaconda3.6/bin/python3"
--conf "spark.pyspark.python=/hadoop/anaconda3.6/bin/python3"
main.py
Upvotes: 3