Reputation: 101
I have following structure on Google Cloud Storage (GCS) bucket :
gs://my_bucket/py_scripts/
wrapper.py
mymodule.py
_init__.py
I am running wrapper.py
through Dataproc as a pyspark job and it imports mymodule
using import mymodule
at the start but the job is returning error saying no module named mymodule
even though they are at the same path. This however works fine in the Unix environment.
Note that _init__.py
is empty. Also tested from mymodule import myfunc
but returns same error.
Upvotes: 4
Views: 2182
Reputation: 74
Can you provide your pyspark job submit command ? I suspect you are not passing "--py-files" params to provide other python files to job. Check for reference https://cloud.google.com/sdk/gcloud/reference/dataproc/jobs/submit/pyspark . Dataproc will not assume files in same GS bucket as input to job.
Upvotes: 2