Reputation: 47
I am trying to mount Azure Data Lake Gen 2 in Databricks getting the error seen below.
java.lang.NullPointerException: authEndpoint
The code I am using is shown below
configs = {
"fs.azure.account.auth.type": "OAuth",
"fs.azure.account.auth.provider.type": "org.apache.hadoop.fs.azurebfs.ClientCredsTokenProvider",
"fs.azure.account.auth2.client.id": "<client-id>",
"fs.azure.account.auth2.client.secret": dbutils.secrets.get(scope = "scope1", key = "kvsecretfordbricks"),
"dfs.adls.oauth2.refresh.url": "https://login.microsoftonline.com/<tenant-id>/oauth2/token"}
dbutils.fs.mount(
source = "abfss://[email protected]/",
mount_point = "/mnt/demo",
extra_configs = configs)
The full error is given below
--------------------------------------------------------------------------- ExecutionError Traceback (most recent call last) in 9 source = "abfss://[email protected]/", 10 mount_point = "/mnt/demo", ---> 11 extra_configs = configs)
/local_disk0/tmp/1612619970782-0/dbutils.py in f_with_exception_handling(*args, **kwargs) 312 exc.context = None 313 exc.cause = None --> 314 raise exc 315 return f_with_exception_handling 316
ExecutionError: An error occurred while calling o271.mount. : java.lang.NullPointerException: authEndpoint at shaded.databricks.v20180920_b33d810.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:204) at shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.oauth2.AzureADAuthenticator.getTokenUsingClientCreds(AzureADAuthenticator.java:84) at com.databricks.backend.daemon.dbutils.DBUtilsCore.verifyAzureOAuth(DBUtilsCore.scala:477) at com.databricks.backend.daemon.dbutils.DBUtilsCore.verifyAzureFileSystem(DBUtilsCore.scala:488) at com.databricks.backend.daemon.dbutils.DBUtilsCore.mount(DBUtilsCore.scala:446) at sun.reflect.GeneratedMethodAccessor292.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380) at py4j.Gateway.invoke(Gateway.java:295) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:251) at java.lang.Thread.run(Thread.java:748)
Any help would be appreciated
When I run
dbutils.fs.unmount("/mnt")
There are no mount points beginning with "/mnt"
--
UPDATE
Additional Error Message after updating dfs.adls.oauth2.refresh.url
as fs.azure.account.oauth2.client.endpoint
ExecutionError Traceback (most recent call last) in 9 source = "abfss://[email protected]/", 10 mount_point = "/mnt/demo", ---> 11 extra_configs = configs)
/local_disk0/tmp/1612858508533-0/dbutils.py in f_with_exception_handling(*args, **kwargs) 312 exc.context = None 313 exc.cause = None --> 314 raise exc 315 return f_with_exception_handling 316
ExecutionError: An error occurred while calling o275.mount. : java.lang.NullPointerException: clientId at shaded.databricks.v20180920_b33d810.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:204) at shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.oauth2.AzureADAuthenticator.getTokenUsingClientCreds(AzureADAuthenticator.java:85) at com.databricks.backend.daemon.dbutils.DBUtilsCore.verifyAzureOAuth(DBUtilsCore.scala:477) at com.databricks.backend.daemon.dbutils.DBUtilsCore.verifyAzureFileSystem(DBUtilsCore.scala:488) at com.databricks.backend.daemon.dbutils.DBUtilsCore.mount(DBUtilsCore.scala:446) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380) at py4j.Gateway.invoke(Gateway.java:295) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:251) at java.lang.Thread.run(Thread.java:748)
Upvotes: 1
Views: 13644
Reputation: 23
I faced a similar issue using free tier, after much research and deliberation, found out that there is no issue with free tier. I just recreated the clientsecret, clientid and tenantid in the Azure keyvault. And after that I recreated the scope in Databricks and now everything seems to be working fine.
Anyone who faces such an issue can refer to this tutorial to correctly implement "Connect to Azure Data Lake Storage Gen2 using service principal" by Microsoft
https://learn.microsoft.com/en-us/azure/databricks/getting-started/connect-to-azure-storage
or
Upvotes: 2
Reputation: 23141
If you want to mount an Azure Data Lake Storage Gen2 account to DBFS, please update dfs.adls.oauth2.refresh.url
as fs.azure.account.oauth2.client.endpoint
. For more details, please refer to the official document and here
For example
az login
az storage account create \
--name <account-name> \
--resource-group <group name> \
--location westus \
--sku Standard_RAGRS \
--kind StorageV2 \
--enable-hierarchical-namespace true
az login
az ad sp create-for-rbac -n "MyApp" --role "Storage Blob Data Contributor" \
--scopes /subscriptions/<subscription>/resourceGroups/<resource-group>/providers/Microsoft.Storage/storageAccounts/<storage-account>
Create a Spark cluster in Azure Databricks
mount Azure data lake gen2 in Azure databricks(python)
configs = {"fs.azure.account.auth.type": "OAuth",
"fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
"fs.azure.account.oauth2.client.id": "<application-id>",
"fs.azure.account.oauth2.client.secret": dbutils.secrets.get(scope="<scope-name>",key="<service-credential-key-name>"),
"fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/<directory-id>/oauth2/token"}
# Optionally, you can add <directory-name> to the source URI of your mount point.
dbutils.fs.mount(
source = "abfss://<file-system-name>@<storage-account-name>.dfs.core.windows.net/",
mount_point = "/mnt/demo",
extra_configs = configs)
dbutils.fs.ls("/mnt/demo")
Upvotes: 1