Reputation: 41
According to Microsoft's documentation it is possible to upload a python wheel file so that you can use custom libraries in Synapse Analytics. Here is that documentation: https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-azure-portal-add-libraries
I have created a simple library with just a hello world function that I was able to install with pip on my own computer. So I know my wheel file works.
I uploaded my wheel file to the location Microsoft's documentation say to upload the file.
I also found a youtube video of a person doing exactly what I am trying to do. Here is the video: https://www.youtube.com/watch?v=t4-2i1sPD4U
Microsoft's documentation mentions this, "Custom packages can be added or modified between sessions. However, you will need to wait for the pool and session to restart to see the updated package."
As far as I can tell there is no way to restart a pool, and I also do not know how to tell if the pool is down or has restarted.
When I try to use the library in a notebook I get a module not found error.
Upvotes: 2
Views: 2149
Reputation: 41
Making changes to the spark pool's scale settings does restart the spark pool as HimanshuSinha-msft suggested. That was not my problem though.
The actual problem was that I needed the Storage Blob Data Contributor role in the data lake storage the files were stored in. I assumed because I already had owner permissions and because I could create a folder and upload there I had all the permissions I needed. Once I got the Storage Blob Data Contributor role though everything worked.
Upvotes: 2
Reputation: 1776
Scaling up or down will force the cluster to restart .
Upvotes: 1