Haha
Haha

Reputation: 1019

Write from databricks in storage account using Service Principal token

I know how to write from databricks using storage account access key.

spark.conf.set(
  "fs.azure.account.key.MyStorageAccount.blob.core.windows.net",
  "XxXxXxXxXxXxXxXxXxXxXxXxXxXxXx")

df = spark.createDataFrame([(1, "foo")],["id", "label"])

df.write.format("delta").save("wasbs://[email protected]/HERE")

Now I want to do the same using Servie Principal.

I found this code to generate a valid SP token:

client_id = "AXAXAXAX"
secret_id = "YNTHBRGEZFGTYUI"
storage_account_url = "https://MyStorageAccount.blob.core.windows.net/"

token_credential = ClientSecretCredential(TENANT_ID, CLIENT_ID, CLIENT_SECRET)

How can I pass it to my spark session in order to access the desired storage account. My SPN is already contributer on desired storage account MyStorageAccount.

EDIT:

After further searching, I found the following tuto, so I wrote the same code:

spark.conf.set("fs.azure.account.auth.type", "OAuth")
spark.conf.set("fs.azure.account.oauth.provider.type", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set("fs.azure.account.oauth2.client.id", client_id)
spark.conf.set("fs.azure.account.oauth2.client.secret", secret_id)
spark.conf.set("fs.azure.account.oauth2.client.endpoint", "https://login.microsoftonline.com/TENANT_ID/oauth2/token")

df = spark.createDataFrame([(1, "foo")],["id", "label"])

df.write.format("delta").save("wasbs://[email protected]/HERE")

But I am having the following error:

shaded.databricks.org.apache.hadoop.fs.azure.AzureException: 
shaded.databricks.org.apache.hadoop.fs.azure.AzureException: Container MyContainerin account MyStorageAccount.blob.core.windows.net not found, and we can't create it using anoynomous credentials, and no credentials found for them in the configuration.

Upvotes: 2

Views: 831

Answers (1)

Kombajn zbożowy
Kombajn zbożowy

Reputation: 10703

The tutorial you linked is for Azure Synapse.

Databricks allows you to mount Blob Storage using account key or SAS ->docs.

You can access Datalake storage using OAuth ->docs. So if you are able to convert your storage account (ie. enable hierarchical namespace) then you'll be able to use it.

Upvotes: 0

Related Questions