MelatoninGummies
MelatoninGummies

Reputation: 25

No module error when running spark-submit

I'm submitting a python file which depends on custom modules to run. The file I'm trying to submit is located at project/main.py and our modules are located at project/modules/module1.py. I'm submitting to Yarn in client mode and receiving the following error.

ModuleNotFoundError: No module named 'modules.module1'

The import statement in main.py:

from modules import module1.py

I have tried zipping the modules folder and passing it to --py-files:

spark-submit --master yarn --queue OurQueue --py-files hdfs://HOST/path/to/modules.zip
--conf "spark.pyspark.driver.python=/hadoop/anaconda3.6/bin/python3"
--conf "spark.pyspark.python=/hadoop/anaconda3.6/bin/python3"
main.py

Upvotes: 2

Views: 5705

Answers (1)

A.B
A.B

Reputation: 20445

Assuming you have a zip file made as

zip -r modules

I think that you are missing to attach this file to spark context, you can use addPyFile() function in the script as

  sc.addPyFile("modules.zip")

Also, Dont forget to make make empty __init__.py file at root level in your directory(modules.zip) like modules/__init__.py )

Now to Import, I think you can import it as

 from modules.module1 import *

or

 from modules.module1 import module1

Updated, Now run the spark-submit command as

spark-submit --master yarn --queue OurQueue --py-files modules.zip
--conf "spark.pyspark.driver.python=/hadoop/anaconda3.6/bin/python3"
--conf "spark.pyspark.python=/hadoop/anaconda3.6/bin/python3"
main.py

Upvotes: 3

Related Questions