Reputation: 11
Recently, Microsoft released a way for Pandas to read/write Azure Data Lake Storage Gen2 data in serverless Apache Spark pool in Synapse Analytics as per the below link: https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/tutorial-use-pandas-spark-pool
If I have to use the same strategy for pyspark in Azure DataBricks, how can I use the datalake secret (from Azure Key Vault) containing the account key so that pandas can access the data lake smoothly? In this way, I don't have to expose the secret value in DataBricks notebook
Upvotes: 1
Views: 388
Reputation: 87069
for Azure Databricks you just need to create a secret scope out of the Azure KeyVault, and then you can use dbutils.secrets.get
function to retrieve a secret from secret scope or ingest the secrets into a Spark conf.
Please note that you will need to set correct Spark configuration to use that storage account key refer to documentation for details (blob storage, ADLS Gen2)
Upvotes: 0