Reputation: 83
I am following the steps from this guide to connect to my Blob store which is General purpose V2 (Not ADL Gen 2): https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/microsoft-spark-utilities?pivots=programming-language-python
My linked service to blob storage account was using managed identity and it works just fine for public access container for reading files with this syntax
blob_sas_token = mssparkutils.credentials.getConnectionStringOrCreds(linked_service_name)
wasb_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path)
spark.conf.set('fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name), blob_sas_token)
df = spark.read.option('multiline','true').json(wasb_path)
But when I try to connect to a private access container using similar method, it gives me Path does not exist error.
I cannot change the access level of container since it is production system.
I tried to create a linkedserver with SAS key at account level and usedsame syntax just to see if it works, but I get
Py4JJavaError: An error occurred while calling z:mssparkutils.credentials.getConnectionStringOrCreds.
: java.lang.Exception: Access token couldn't be obtained {"result":"DependencyError","errorId":"BadRequest","errorMessage":"LSRServiceException is [{\"StatusCode\":400,\"ErrorResponse\":{\"code\":\"SasTokenNull\",\"message\":\"SAS token is null
Upvotes: 1
Views: 2990
Reputation: 11454
Please recheck the path of your blob file and roles for the managed identity. If you still face the same, you can use the linked service for blob storage with account key authentication as a workaround.
These are my private containers.
CSV file inside the container3
Here, when creating the linked service for the Blob storage select the Account key as the Authentication type and give your subscription and storage account details.
Code:
from pyspark.sql import SparkSession
blob_account_name = 'storage account name'
blob_container_name = 'container name'
blob_relative_path = 'blob folder path'
linked_service_name ='linked service name'
blob_sas_token = mssparkutils.credentials.getConnectionStringOrCreds(linked_service_name)
wasb_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path)
spark.conf.set('fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name), blob_sas_token)
df=spark.read.csv(wasb_path)
df.show()
Here I am using the same code from the Official documentation that you provided.
Upvotes: 1