Reputation: 1351
I am trying to read Storage account container using Synapse Notebook and I hit the below error with SAS token. My SAS token has the right to read , list write and add. Can someone confirm if I am doing the write way. i have my spark pool running
%%synapse
from pyspark.sql.types import *
from pyspark.sql.window import Window
from pyspark.sql.functions import lag, col
from pyspark.sql.functions import *
from pyspark.sql import functions as F
from datetime import timedelta, datetime, date
spark.conf.set("fs.azure.account.auth.type.azdevstoreforlogs.dfs.core.windows.net", "SAS")
spark.conf.set("fs.azure.sas.token.provider.type.azdevstoreforlogs.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.sas.FixedSASTokenProvider")
spark.conf.set("fs.azure.sas.fixed.token.azdevstoreforlogs.dfs.core.windows.net", "SASTOken")
mssparkutils.fs.ls("abfss://[email protected]/resourceId=/SUBSCRIPTIONS/5jlkhsd-sdhsnjagdEB/RESOURCEGROUPS/AZURE-DEV/PROVIDERS/MICROSOFT.CONTAINERSERVICE/MANAGEDCLUSTERS/AZURE-DEV/y=2022/m=08/d=09/h=11/m=00/")
df = spark.read.format("json").load("abfss://[email protected]/resourceId=/SUBSCRIPTIONS/5jlkhsd-sdhsnjagdEB/RESOURCEGROUPS/AZURE-DEV/PROVIDERS/MICROSOFT.CONTAINERSERVICE/MANAGEDCLUSTERS/AZURE-DEV/y=2022/m=08/d=09/h=11/m=00/")
df.show()
SparkStatementException: An error occurred while calling z:mssparkutils.fs.ls. : Unable to load SAS token provider class: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.azurebfs.sas.FixedSASTokenProvider not foundjava.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.azurebfs.sas.FixedSASTokenProvider not found
Upvotes: 0
Views: 1486
Reputation: 14915
First, you do not want to use a SAS token since it is just a string that can be see in your code. Second, your code is wrong. You need to replace SAS Token with a real string that allows access to the storage. Look at the section name "ADLS Gen2 storage (without linked services)" for details.
There is a lot easier and more secure way to accomplish this task.
A better way is to create a linked service using a managed identity. Azure behind the scene manages the service principle for you. Just make sure you give the managed identity the ACL rights it needs to the folder and storage.
Below is a connection to my storage container for my blog on MS SQL TIPS.
If we right click the file, we have a bunch of choices. Let us view the contents of the file.
We can see it is a delimited format in which ";" is the delimiter.
If we right click again and select (new notebook -> load to dataframe), synapse give us sample code. Add the header and sep options as seen in the image. It will allow for the correct loading of the file.
For those who are purist, you can do this spark.set(). The secret information should be pulled from a key vault.
For those who want write low or no code, use the data section (linked services) of synapse to define storage account/container connections. Write your PySpark code to read, manipulate and write files.
Upvotes: 1