Reputation: 33
I am trying to pass a whole directory of python files that are referenced in the main python file in Azure Synapse Spark Job Definition but the files are not appearing in the location and I get Module Not Found Error. Trying to upload like this:
abfss://[directory path in data lake]/*
Upvotes: 1
Views: 1752
Reputation: 31
You have to trick the Spark job definition by exporting it, editing it as a JSON, and importing it back.
After the export, open in a text editor and add the following:
"conf": {
"spark.submit.pyFiles":
"path-to-abfss/module1.zip, path-to-abfss/module2.zip"
},
Now, import the JSON back.
Upvotes: 3
Reputation: 994
The way to achieve this on Synapse is to package your python files into a wheel package and upload the wheel package to a specific location the Azure Data Lake Storage where your spark pool will load them from every time it starts. This will make the custom python packages available to all jobs and notebooks using that spark pool.
You can find more details on the official documentation: https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-manage-python-packages#install-wheel-files
Upvotes: 1