Kzryzstof
Kzryzstof

Reputation: 8382

Databricks fails accessing a Data Lake Gen1 while trying to enumerate a directory

I am using (well... trying to use) Azure Databricks and I have created a notebook.

I would like the notebook to connect my Azure Data Lake (Gen1) and transform the data. I followed the documentation and put the code in the first cell of my notebook:

spark.conf.set("dfs.adls.oauth2.access.token.provider.type", "ClientCredential")
spark.conf.set("dfs.adls.oauth2.client.id", "**using the application ID of the registered application**")
spark.conf.set("dfs.adls.oauth2.credential", "**using one of the registered application keys**")
spark.conf.set("dfs.adls.oauth2.refresh.url", "https://login.microsoftonline.com/**using my-tenant-id**/oauth2/token")

dbutils.fs.ls("adl://**using my data lake uri**.azuredatalakestore.net/tenantdata/events")

The execution fails with this error:

com.microsoft.azure.datalake.store.ADLException: Error enumerating directory /

Operation null failed with exception java.io.IOException : Server returned HTTP response code: 400 for URL: https://login.microsoftonline.com/using my-tenant-id/oauth2/token Last encountered exception thrown after 5 tries.

[java.io.IOException,java.io.IOException,java.io.IOException,java.io.IOException,java.io.IOException] [ServerRequestId:null] at com.microsoft.azure.datalake.store.ADLStoreClient.getExceptionFromResponse(ADLStoreClient.java:1169) at com.microsoft.azure.datalake.store.ADLStoreClient.enumerateDirectoryInternal(ADLStoreClient.java:558) at com.microsoft.azure.datalake.store.ADLStoreClient.enumerateDirectory(ADLStoreClient.java:534) at com.microsoft.azure.datalake.store.ADLStoreClient.enumerateDirectory(ADLStoreClient.java:398) at com.microsoft.azure.datalake.store.ADLStoreClient.enumerateDirectory(ADLStoreClient.java:384)

I have given the registered application the Reader role to the Data Lake:

enter image description here

Question

How can I allow Spark to access the Data Lake?

Update

I have granted both the tenantdata and events folders Read and Execute access:

Granted persmissions on folder

Upvotes: 0

Views: 4381

Answers (1)

simon_dmorias
simon_dmorias

Reputation: 2473

The RBAC roles on the Gen1 lake do not grant access to the data (just the resource itself), with exception of the Owner role which grants Super User access and does grant full data access.

You must grant access to the folders/files themselves using Data Explorer in the Portal or download storage explorer using POSIX permissions.

This guide explains the detail of how to do that: https://learn.microsoft.com/en-us/azure/data-lake-store/data-lake-store-access-control

Reference: https://learn.microsoft.com/en-us/azure/data-lake-store/data-lake-store-secure-data

Only the Owner role automatically enables file system access. The Contributor, Reader, and all other roles require ACLs to enable any level of access to folders and files

Upvotes: 1

Related Questions