user15037183
user15037183

Reputation: 47

Error Mounting ADLS on DBFS for Databricks (Error: NullPointerException)

I am trying to mount Azure Data Lake Gen 2 in Databricks getting the error seen below.

java.lang.NullPointerException: authEndpoint

The code I am using is shown below

configs = {
  "fs.azure.account.auth.type": "OAuth",
  "fs.azure.account.auth.provider.type": "org.apache.hadoop.fs.azurebfs.ClientCredsTokenProvider",
  "fs.azure.account.auth2.client.id": "<client-id>",
  "fs.azure.account.auth2.client.secret": dbutils.secrets.get(scope = "scope1", key = "kvsecretfordbricks"),
  "dfs.adls.oauth2.refresh.url": "https://login.microsoftonline.com/<tenant-id>/oauth2/token"}

dbutils.fs.mount(
    source = "abfss://[email protected]/",
    mount_point = "/mnt/demo",
  extra_configs = configs)

The full error is given below

--------------------------------------------------------------------------- ExecutionError Traceback (most recent call last) in 9 source = "abfss://[email protected]/", 10 mount_point = "/mnt/demo", ---> 11 extra_configs = configs)

/local_disk0/tmp/1612619970782-0/dbutils.py in f_with_exception_handling(*args, **kwargs) 312 exc.context = None 313 exc.cause = None --> 314 raise exc 315 return f_with_exception_handling 316

ExecutionError: An error occurred while calling o271.mount. : java.lang.NullPointerException: authEndpoint at shaded.databricks.v20180920_b33d810.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:204) at shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.oauth2.AzureADAuthenticator.getTokenUsingClientCreds(AzureADAuthenticator.java:84) at com.databricks.backend.daemon.dbutils.DBUtilsCore.verifyAzureOAuth(DBUtilsCore.scala:477) at com.databricks.backend.daemon.dbutils.DBUtilsCore.verifyAzureFileSystem(DBUtilsCore.scala:488) at com.databricks.backend.daemon.dbutils.DBUtilsCore.mount(DBUtilsCore.scala:446) at sun.reflect.GeneratedMethodAccessor292.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380) at py4j.Gateway.invoke(Gateway.java:295) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:251) at java.lang.Thread.run(Thread.java:748)

Any help would be appreciated

When I run

dbutils.fs.unmount("/mnt")

There are no mount points beginning with "/mnt"

--

UPDATE

Additional Error Message after updating dfs.adls.oauth2.refresh.url as fs.azure.account.oauth2.client.endpoint

ExecutionError Traceback (most recent call last) in 9 source = "abfss://[email protected]/", 10 mount_point = "/mnt/demo", ---> 11 extra_configs = configs)

/local_disk0/tmp/1612858508533-0/dbutils.py in f_with_exception_handling(*args, **kwargs) 312 exc.context = None 313 exc.cause = None --> 314 raise exc 315 return f_with_exception_handling 316

ExecutionError: An error occurred while calling o275.mount. : java.lang.NullPointerException: clientId at shaded.databricks.v20180920_b33d810.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:204) at shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.oauth2.AzureADAuthenticator.getTokenUsingClientCreds(AzureADAuthenticator.java:85) at com.databricks.backend.daemon.dbutils.DBUtilsCore.verifyAzureOAuth(DBUtilsCore.scala:477) at com.databricks.backend.daemon.dbutils.DBUtilsCore.verifyAzureFileSystem(DBUtilsCore.scala:488) at com.databricks.backend.daemon.dbutils.DBUtilsCore.mount(DBUtilsCore.scala:446) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380) at py4j.Gateway.invoke(Gateway.java:295) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:251) at java.lang.Thread.run(Thread.java:748)

Upvotes: 1

Views: 13644

Answers (2)

Farrukh zaman Khan
Farrukh zaman Khan

Reputation: 23

I faced a similar issue using free tier, after much research and deliberation, found out that there is no issue with free tier. I just recreated the clientsecret, clientid and tenantid in the Azure keyvault. And after that I recreated the scope in Databricks and now everything seems to be working fine.

Anyone who faces such an issue can refer to this tutorial to correctly implement "Connect to Azure Data Lake Storage Gen2 using service principal" by Microsoft

https://learn.microsoft.com/en-us/azure/databricks/getting-started/connect-to-azure-storage

or

https://deep.data.blog/2019/03/28/avoiding-error-403-request-not-authorized-when-accessing-adls-gen-2-from-azure-databricks-while-using-a-service-principal/

Upvotes: 2

Jim Xu
Jim Xu

Reputation: 23141

If you want to mount an Azure Data Lake Storage Gen2 account to DBFS, please update dfs.adls.oauth2.refresh.url as fs.azure.account.oauth2.client.endpoint. For more details, please refer to the official document and here

For example

  1. Create an Azure Data Lake Storage Gen2 account.
az login
az storage account create \
    --name <account-name> \
    --resource-group <group name> \
    --location westus \
    --sku Standard_RAGRS \
    --kind StorageV2 \
    --enable-hierarchical-namespace true
  1. Create a service principal and assign Storage Blob Data Contributor to the sp in the scope of the Data Lake Storage Gen2 storage account
az login

az ad sp create-for-rbac -n "MyApp" --role "Storage Blob Data Contributor" \
    --scopes /subscriptions/<subscription>/resourceGroups/<resource-group>/providers/Microsoft.Storage/storageAccounts/<storage-account>
  1. Create a Spark cluster in Azure Databricks

  2. mount Azure data lake gen2 in Azure databricks(python)

configs = {"fs.azure.account.auth.type": "OAuth",
           "fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
           "fs.azure.account.oauth2.client.id": "<application-id>",
           "fs.azure.account.oauth2.client.secret": dbutils.secrets.get(scope="<scope-name>",key="<service-credential-key-name>"),
           "fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/<directory-id>/oauth2/token"}

# Optionally, you can add <directory-name> to the source URI of your mount point.
dbutils.fs.mount(
  source = "abfss://<file-system-name>@<storage-account-name>.dfs.core.windows.net/",
  mount_point = "/mnt/demo",
  extra_configs = configs)

enter image description here

  1. Check
dbutils.fs.ls("/mnt/demo")

enter image description here

Upvotes: 1

Related Questions