Steven
Steven

Reputation: 53

connecting data lake storage gen 2 with databricks

I am trying to connect MS Azure databricks with data lake storage v2, and not able to match the client, secret scope and key.

I have data in a Azure data lake v2. I am trying to follow these instructions:

https://docs.azuredatabricks.net/spark/latest/data-sources/azure/azure-datalake-gen2.html#requirements-azure-data-lake

I have created a 'service principle' with the role "Storage Blob Data Contributor", obtained

I have created secret scopes in both Azure Keyvault and Databricks with keys and values

when I try the code below, the authentication fails to recognize the secret scope & key. It is not clear to me from the documentation if it is necessary to use the Azure Keyvault or Databricks secret scope.

val configs = Map(
  "fs.azure.account.auth.type" -> "OAuth",
  "fs.azure.account.oauth.provider.type" -> "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
  "fs.azure.account.oauth2.client.id" -> "<CLIENT-ID>",
  "fs.azure.account.oauth2.client.secret" -> dbutils.secrets.get(scope = "<SCOPE-NAME>", key = "<KEY-VALUE>"),
  "fs.azure.account.oauth2.client.endpoint" -> "https://login.microsoftonline.com/XXXXXXXXXX/oauth2/token")

If anybody could help on this, please advise / confirm:

what should be CLIENT-ID : I understand this to be from the storage account;

where should the SCOPE-NAME and KEY-VALUE be created, in Azure Keyvault or Databricks?

Upvotes: 5

Views: 11971

Answers (2)

udyan
udyan

Reputation: 141

I was facing the same issue , the only thing i did extra was to assign the default permission of the application to datalake gen2's blob container in azure storage explorer . It required the object id of the application , which is not the one available on the UI , it can be taken by using the command "az ad sp show --id " on azure-cli . After assign the permission on blob container, create a new file, and then try to access it,

Upvotes: 0

simon_dmorias
simon_dmorias

Reputation: 2473

The XXXX in https://login.microsoftonline.com/XXXXXXXXXX/oauth2/token should be your TenantID (get this from the Azure Active Directory tab in the Portal > Properties > DirectoryID).

The Client ID is the ApplicationID/Service Principal ID (sadly these names are used interchangeably in the Azure world - but they are all the same thing).

If you have not created a service principal yet follow these instructions: https://learn.microsoft.com/en-us/azure/storage/common/storage-auth-aad-app#register-your-application-with-an-azure-ad-tenant - make sure you grant the service principal access to your lake once it is created.

You should create a scope and secret for the Principal ID Key - as this is something you want to hide from free text. You cannot create this in the Databricks UI (yet). Use one of these:

Right now I do not think can create secrets in Azure KeyVault - though I expect to see that in the future. Technically you could manually integrate with Key Vault using their API's but it would give you another headache in needing a secret credential to connect to key vault.

Upvotes: 4

Related Questions