Import additional python files in main python file which is used in spark-submit of databricks

Question

I have created a databricks in azure. I have launched a workspace and a cluster in it. I have placed the main python file in databrick filesystem. dbfs:/FileStore/tables/read_batch.py

This read_batch.py has imported another python files from directory called my_util. Usage in)

from my_util.apps_config import crct_type_list

I have placed the apps_config.py inside my_util directory which is parallel to the main python file read_batch.py. i.e my_util directory is also present inside dbfs:/FileStore/tables.

When i try to create a spark-submit job in data bricks, i am getting the following error,

ImportError: No module named 'my_util'

What is the correct way to run this spark-submit job in databricks without making all the contents into a single large python file?

Nandha · Accepted Answer

I zipped the dependent files and uploaded it. I have imported the contents zip files in main python file using,

import sys
sys.path.insert(0, jobs.zip)

Included the zip file during spark submit using "--py-files jobs.zip". Refer the following link, which talks about the best practices for spark submit. https://developerzen.com/best-practices-writing-production-grade-pyspark-jobs-cb688ac4d20f

Import additional python files in main python file which is used in spark-submit of databricks

Answers (1)

Related Questions