Tahmeed
Tahmeed

Reputation: 48

Unable to read data from ADLS gen 2 in Azure Databricks

I have followed this Microsoft Documentation to connect to my gen2 storage account: https://learn.microsoft.com/en-gb/azure/databricks/connect/storage/tutorial-azure-storage

and used this to authenticate according to step 6:

service_credential = dbutils.secrets.get(scope="<scope>",key="<service-credential-key>")

spark.conf.set("fs.azure.account.auth.type.<storage-account>.dfs.core.windows.net", "OAuth")
spark.conf.set("fs.azure.account.oauth.provider.type.<storage-account>.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set("fs.azure.account.oauth2.client.id.<storage-account>.dfs.core.windows.net", "<application-id>")
spark.conf.set("fs.azure.account.oauth2.client.secret.<storage-account>.dfs.core.windows.net", service_credential)
spark.conf.set("fs.azure.account.oauth2.client.endpoint.<storage-account>.dfs.core.windows.net", "https://login.microsoftonline.com/<directory-id>/oauth2/token")

Now when I am running this:

df = spark.read.csv("abfss://<filepath>")

I am getting this error: abfss://filepath has invalid authority.

I have double checked :

  1. tenant id of the SP
  2. client id of the SP
  3. secret scope name created according to the above mentioned documentation
  4. The role of the service principal in the container is "Storage Blob data Contributor"

File Service properties of my storage account:

Large file share Disabled

Identity-based access Not configured

Default share-level permissions Disabled

Soft delete Enabled (7 days)

Share capacity 5 TiB

Upvotes: 0

Views: 110

Answers (1)

Tahmeed
Tahmeed

Reputation: 48

Scope for SP didn't work even though the SP had "Storage Blob Data Contributor" role. So I tried creating a scope for my container's access key and it worked without any issues. Not sure exactly what the issue was though with the SP. I used this:

spark.conf.set(f"fs.azure.account.key.<container>.blob.core.windows.net", dbutils.secrets.get("scope-name", "secret-name"))

df = spark.read.csv(f"wasbs://container-name@sa_name.blob.core.windows.net/filepath")

Upvotes: 0

Related Questions