btrose
btrose

Reputation: 11

Azure Workload Identity with Spark on Kubernetes

How to configure Spark to use Azure Workload Identity to access storage from AKS pods, rather than having to pass the client secret?

I am able to successfully pass these properties and connect to ADLS Gen 2 containers:

spark.conf.set("fs.azure.account.auth.type.<storage-account>.dfs.core.windows.net", "OAuth")
spark.conf.set("fs.azure.account.oauth.provider.type.<storage-account>.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set("fs.azure.account.oauth2.client.id.<storage-account>.dfs.core.windows.net", "<application-id>")
spark.conf.set("fs.azure.account.oauth2.client.secret.<storage-account>.dfs.core.windows.net", service_credential)
spark.conf.set("fs.azure.account.oauth2.client.endpoint.<storage-account>.dfs.core.windows.net", "https://login.microsoftonline.com/<directory-id>/oauth2/token")

However, I would like to take advantage of workload identity and not have to pass any secret. I've also tried following the recommendations from Hadoop to use managed identity but to no avail. https://hadoop.apache.org/docs/stable/hadoop-azure/abfs.html#Azure_Managed_Identity

<property>
  <name>fs.azure.account.auth.type</name>
  <value>OAuth</value>
  <description>
  Use OAuth authentication
  </description>
</property>
<property>
  <name>fs.azure.account.oauth.provider.type</name>
  <value>org.apache.hadoop.fs.azurebfs.oauth2.MsiTokenProvider</value>
  <description>
  Use MSI for issuing OAuth tokens
  </description>
</property>
<property>
  <name>fs.azure.account.oauth2.msi.tenant</name>
  <value></value>
  <description>
  Optional MSI Tenant ID
  </description>
</property>
<property>
  <name>fs.azure.account.oauth2.msi.endpoint</name>
  <value></value>
  <description>
   MSI endpoint
  </description>
</property>
<property>
  <name>fs.azure.account.oauth2.client.id</name>
  <value></value>
  <description>
  Optional Client ID
  </description>
</property>

When we've tried the above properties, we get back the below error with HTML.

Error from using managed identity properties

Upvotes: 1

Views: 983

Answers (1)

Ulky Igor
Ulky Igor

Reputation: 362

Secure access to Azure resources from AKS ( using Managed Identities) was formaly handled by integration of aad-pod-idenity in your AKS cluster.
So you need to make sure your cluster supports AAD-pod-Identity, and configure your workload and K8S ressources (pods, services accounts, etc) accordingly.
See https://azure.github.io/aad-pod-identity/docs/demo/standard_walkthrough/

However, aad-pod-idenity has been marked deprecated early 2023 and replaced by Azure Workload Identity. But support of Workload Identity is not yet done in Hadoop-azure project. A Jira is pending on the task though: https://issues.apache.org/jira/browse/HADOOP-18610

Upvotes: 1

Related Questions