ChingamChaapu
ChingamChaapu

Reputation: 33

Azure Synapse: Upload directory of py files in Spark job reference files

I am trying to pass a whole directory of python files that are referenced in the main python file in Azure Synapse Spark Job Definition but the files are not appearing in the location and I get Module Not Found Error. Trying to upload like this:

abfss://[directory path in data lake]/*

Upvotes: 1

Views: 1752

Answers (2)

Skand Upmanyu
Skand Upmanyu

Reputation: 31

You have to trick the Spark job definition by exporting it, editing it as a JSON, and importing it back.

After the export, open in a text editor and add the following:

"conf": {
  "spark.submit.pyFiles": 
    "path-to-abfss/module1.zip, path-to-abfss/module2.zip"
},

Now, import the JSON back.

Upvotes: 3

Simon Ndunda
Simon Ndunda

Reputation: 994

The way to achieve this on Synapse is to package your python files into a wheel package and upload the wheel package to a specific location the Azure Data Lake Storage where your spark pool will load them from every time it starts. This will make the custom python packages available to all jobs and notebooks using that spark pool.

You can find more details on the official documentation: https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-manage-python-packages#install-wheel-files

Upvotes: 1

Related Questions