Reputation: 91
I am trying to mount adls gen2 in dattabricks with following configuration
configs = {"fs.azure.account.auth.type": "OAuth",
"fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
"fs.azure.account.oauth2.client.id": "service principal id",
"fs.azure.account.oauth2.client.secret": "service principal key",
"fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/tenant-id/oauth2/token",
"fs.azure.createRemoteFileSystemDuringInitialization": "true"}
dbutils.fs.mount(
source = "abfss://[email protected]/directory",
mount_point = "/mnt/data",
extra_configs = configs)
i have created the service principal and create key for it then provided storage blob role to this service principal in active directory role assignment
as per document
"abfss://<your-file-system-name>@<your-storage-account-name>.dfs.core.windows.net/<your-directory-name>"
what should be your-file-system ==> folder inside blob container ? you-directory-name ==> i have only one folder inside blob container so confuse here storgae(ADLS gen 2 preview) Blob Container folder a.txt
error
ExecutionError: An error occurred while calling o480.mount. HEAD https://xxxxxxxxx.dfs.core.windows.net/xxxxxx?resource=filesystem&timeout=90 StatusCode=403 StatusDescription=This request is not authorized to perform this operation using this permission. ErrorCode= ErrorMessage= at shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.execute(AbfsRestOperation.java:134) at shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.services.AbfsClient.getFilesystemProperties(AbfsClient.java:197) at shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.getFilesystemProperties(AzureBlobFileSystemStore.java:214) at shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.fileSystemExists(AzureBlobFileSystem.java:749) at shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.initialize(AzureBlobFileSystem.java:110) at com.databricks.backend.daemon.dbutils.DBUtilsCore.verifyAzureFileSystem(DBUtilsCore.scala:485) at com.databricks.backend.daemon.dbutils.DBUtilsCore.mount(DBUtilsCore.scala:435) at sun.reflect.GeneratedMethodAccessor400.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380) at py4j.Gateway.invoke(Gateway.java:295) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:251) at java.lang.Thread.run(Thread.java:748)
Upvotes: 9
Views: 71162
Reputation: 21
I ran into this issue too after migrating the storage account from blob storage
to data lake storage gen2
.
Turns out you need a separate private endpoint for each storage resource that you need to access, namely Blobs, Data Lake Storage Gen2, Files, Queues, Tables, or Static Websites. On the private endpoint, these storage services are defined as the target sub-resource of the associated storage account. (privatelink.dfs.core.windows.net
)
https://learn.microsoft.com/en-us/azure/storage/common/storage-private-endpoints
Upvotes: 2
Reputation: 85
Maybe the problem is you need to grant permissions in container to service principal name (app registration).
To do this to need to follow these steps:
That's it.
I hope it helps someone else.
Upvotes: 1
Reputation: 11639
I used to have the simillar issue. My storage account is gen2
and it contains 2 filesystem
and one normal container.
Then I gave the service-principal-app the role --> Storage Blob Data Contributor
and it solved my problem. Now i have access from databricks to the mounted containers.
Here is how to give permissions to the service-principal-app:
Storage Blob Data Contributor
Upvotes: 5
Reputation: 41
We had similar error, using RBAC we had given Owner role to the service principal on the Storage Account, it did not work, you must grant role/s listed here in order to access the directory/container : https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-access-control-model#role-based-access-control-azure-rbac
Upvotes: 4
Reputation: 21
I've just struggle with this and have corrected this setting name :
"fs.azure.account.oauth.provider.type"
to
"fs.azure.account.oauth2.provider.type"
Execution was successful.
Upvotes: -3
Reputation: 2473
Gen2 lakes do not have containers, they have filesystems (which are a very similiar concept).
On your storage account have you enabled the "Hierarchical namespace" feature? You can see this in the Configuration blade of the Storage account. If you have then the storage account is a Lake Gen2 - if not it is simply a blob storage account and you need to follow the instructions for using blob storage.
Assuming you have set that feature then you can see the FileSystems blade - in there you create file systems, in a very similar way to blob containers. This is the name you need at the start of your abfss URL.
However, the error message you have indicates to me that your service principal does not have permission on the data lake. You should either grant permission using a RBAC role on the storage account resource (add to storage account contributors or readers). Or use Storage Explorer to grant permission at a more granular level.
Remember that data lake requires execute permissions on every folder from root to the folder you are trying to read/write from. As a test try reading a file from root first.
Upvotes: 5