Reputation: 882
I have created a databricks in azure. I have launched a workspace and a cluster in it. I have placed the main python file in databrick filesystem. dbfs:/FileStore/tables/read_batch.py
This read_batch.py has imported another python files from directory called my_util. Usage in)
from my_util.apps_config import crct_type_list
I have placed the apps_config.py inside my_util directory which is parallel to the main python file read_batch.py. i.e my_util directory is also present inside dbfs:/FileStore/tables.
When i try to create a spark-submit job in data bricks, i am getting the following error,
ImportError: No module named 'my_util'
What is the correct way to run this spark-submit job in databricks without making all the contents into a single large python file?
Upvotes: 0
Views: 2920
Reputation: 882
I zipped the dependent files and uploaded it. I have imported the contents zip files in main python file using,
import sys
sys.path.insert(0, jobs.zip)
Included the zip file during spark submit using "--py-files jobs.zip". Refer the following link, which talks about the best practices for spark submit. https://developerzen.com/best-practices-writing-production-grade-pyspark-jobs-cb688ac4d20f
Upvotes: 1