Reputation: 8382
I am using (well... trying to use) Azure Databricks and I have created a notebook.
I would like the notebook to connect my Azure Data Lake (Gen1) and transform the data. I followed the documentation and put the code in the first cell of my notebook:
spark.conf.set("dfs.adls.oauth2.access.token.provider.type", "ClientCredential")
spark.conf.set("dfs.adls.oauth2.client.id", "**using the application ID of the registered application**")
spark.conf.set("dfs.adls.oauth2.credential", "**using one of the registered application keys**")
spark.conf.set("dfs.adls.oauth2.refresh.url", "https://login.microsoftonline.com/**using my-tenant-id**/oauth2/token")
dbutils.fs.ls("adl://**using my data lake uri**.azuredatalakestore.net/tenantdata/events")
The execution fails with this error:
com.microsoft.azure.datalake.store.ADLException: Error enumerating directory /
Operation null failed with exception java.io.IOException : Server returned HTTP response code: 400 for URL: https://login.microsoftonline.com/using my-tenant-id/oauth2/token Last encountered exception thrown after 5 tries.
[java.io.IOException,java.io.IOException,java.io.IOException,java.io.IOException,java.io.IOException] [ServerRequestId:null] at com.microsoft.azure.datalake.store.ADLStoreClient.getExceptionFromResponse(ADLStoreClient.java:1169) at com.microsoft.azure.datalake.store.ADLStoreClient.enumerateDirectoryInternal(ADLStoreClient.java:558) at com.microsoft.azure.datalake.store.ADLStoreClient.enumerateDirectory(ADLStoreClient.java:534) at com.microsoft.azure.datalake.store.ADLStoreClient.enumerateDirectory(ADLStoreClient.java:398) at com.microsoft.azure.datalake.store.ADLStoreClient.enumerateDirectory(ADLStoreClient.java:384)
I have given the registered application the Reader
role to the Data Lake:
Question
How can I allow Spark to access the Data Lake?
Update
I have granted both the tenantdata
and events
folders Read
and Execute
access:
Upvotes: 0
Views: 4381
Reputation: 2473
The RBAC roles on the Gen1 lake do not grant access to the data (just the resource itself), with exception of the Owner role which grants Super User access and does grant full data access.
You must grant access to the folders/files themselves using Data Explorer in the Portal or download storage explorer using POSIX permissions.
This guide explains the detail of how to do that: https://learn.microsoft.com/en-us/azure/data-lake-store/data-lake-store-access-control
Reference: https://learn.microsoft.com/en-us/azure/data-lake-store/data-lake-store-secure-data
Only the Owner role automatically enables file system access. The Contributor, Reader, and all other roles require ACLs to enable any level of access to folders and files
Upvotes: 1