Reading Storage account in Synapse notebook: error occurred while calling z:mssparkutils.fs.ls

Question

I am trying to read Storage account container using Synapse Notebook and I hit the below error with SAS token. My SAS token has the right to read , list write and add. Can someone confirm if I am doing the write way. i have my spark pool running

%%synapse


from pyspark.sql.types import *

from pyspark.sql.window import Window

from pyspark.sql.functions import lag, col

from pyspark.sql.functions import *

from pyspark.sql import functions as F

from datetime import timedelta, datetime, date

spark.conf.set("fs.azure.account.auth.type.azdevstoreforlogs.dfs.core.windows.net", "SAS")

spark.conf.set("fs.azure.sas.token.provider.type.azdevstoreforlogs.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.sas.FixedSASTokenProvider")

spark.conf.set("fs.azure.sas.fixed.token.azdevstoreforlogs.dfs.core.windows.net", "SASTOken")

mssparkutils.fs.ls("abfss://insights-logs-kube-audit@azdevstoreforlogs.dfs.core.windows.net/resourceId=/SUBSCRIPTIONS/5jlkhsd-sdhsnjagdEB/RESOURCEGROUPS/AZURE-DEV/PROVIDERS/MICROSOFT.CONTAINERSERVICE/MANAGEDCLUSTERS/AZURE-DEV/y=2022/m=08/d=09/h=11/m=00/")

df = spark.read.format("json").load("abfss://insights-logs-kube-audit@azdevstoreforlogs.dfs.core.windows.net/resourceId=/SUBSCRIPTIONS/5jlkhsd-sdhsnjagdEB/RESOURCEGROUPS/AZURE-DEV/PROVIDERS/MICROSOFT.CONTAINERSERVICE/MANAGEDCLUSTERS/AZURE-DEV/y=2022/m=08/d=09/h=11/m=00/")

df.show()

SparkStatementException: An error occurred while calling z:mssparkutils.fs.ls. : Unable to load SAS token provider class: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.azurebfs.sas.FixedSASTokenProvider not foundjava.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.azurebfs.sas.FixedSASTokenProvider not found

CRAFTY DBA · Accepted Answer

First, you do not want to use a SAS token since it is just a string that can be see in your code. Second, your code is wrong. You need to replace SAS Token with a real string that allows access to the storage. Look at the section name "ADLS Gen2 storage (without linked services)" for details.

https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-secure-credentials-with-tokenlibrary?pivots=programming-language-python

There is a lot easier and more secure way to accomplish this task.

A better way is to create a linked service using a managed identity. Azure behind the scene manages the service principle for you. Just make sure you give the managed identity the ACL rights it needs to the folder and storage.

Below is a connection to my storage container for my blog on MS SQL TIPS.

If we right click the file, we have a bunch of choices. Let us view the contents of the file.

We can see it is a delimited format in which ";" is the delimiter.

If we right click again and select (new notebook -> load to dataframe), synapse give us sample code. Add the header and sep options as seen in the image. It will allow for the correct loading of the file.

For those who are purist, you can do this spark.set(). The secret information should be pulled from a key vault.

For those who want write low or no code, use the data section (linked services) of synapse to define storage account/container connections. Write your PySpark code to read, manipulate and write files.

Reading Storage account in Synapse notebook: error occurred while calling z:mssparkutils.fs.ls

Answers (1)

Related Questions