Garrett Sippel
Garrett Sippel

Reputation: 41

How do you import a custom python library onto an apache spark pool with Azure Synapse Analytics?

According to Microsoft's documentation it is possible to upload a python wheel file so that you can use custom libraries in Synapse Analytics. Here is that documentation: https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-azure-portal-add-libraries

I have created a simple library with just a hello world function that I was able to install with pip on my own computer. So I know my wheel file works.

I uploaded my wheel file to the location Microsoft's documentation say to upload the file.

I also found a youtube video of a person doing exactly what I am trying to do. Here is the video: https://www.youtube.com/watch?v=t4-2i1sPD4U

Microsoft's documentation mentions this, "Custom packages can be added or modified between sessions. However, you will need to wait for the pool and session to restart to see the updated package."

As far as I can tell there is no way to restart a pool, and I also do not know how to tell if the pool is down or has restarted.

When I try to use the library in a notebook I get a module not found error.

Upvotes: 2

Views: 2149

Answers (2)

Garrett Sippel
Garrett Sippel

Reputation: 41

Making changes to the spark pool's scale settings does restart the spark pool as HimanshuSinha-msft suggested. That was not my problem though.

The actual problem was that I needed the Storage Blob Data Contributor role in the data lake storage the files were stored in. I assumed because I already had owner permissions and because I could create a folder and upload there I had all the permissions I needed. Once I got the Storage Blob Data Contributor role though everything worked.

Upvotes: 2

Himanshu Kumar Sinha
Himanshu Kumar Sinha

Reputation: 1776

Scaling up or down will force the cluster to restart .

Upvotes: 1

Related Questions