Jokin
Jokin

Reputation: 103

Connecting from Azure Databricks to Azure SQL using User Managed Identity

I am trying to read data on an Azure SQL instance from an Azure Databricks workspace, avoiding using username/password personal credentials for automated, regular data fetch & analysis. I thought using a managed identity would do the job, however it looks to be less smooth than with Azure Functions or Web Services. Is this supported in Databricks?

I would need environment variables that do not exist in the Databricks instance, like IDENTITY_ENDPOINT and IDENTITY_HEADER, following the doc https://learn.microsoft.com/en-us/azure/app-service/overview-managed-identity

Any insight would be greatly appreciated!

Upvotes: 10

Views: 3278

Answers (3)

Bob
Bob

Reputation: 1

I was able to resolve the CREATE CONNECTION with the Microsoft documentation.

https://learn.microsoft.com/en-us/azure/databricks/query-federation/sql-server

The user account you use to test the connection must have read/write permissions in the database. Also add your Databricks instance with read/write permissions.

Upvotes: 0

Claire Furney
Claire Furney

Reputation: 2378

I successfully connected from a Databricks workspace to Azure SQL database using a Managed Identity from a Python Notebook. (Note: no joy yet via the CREATE CONNECTION functionality which only seems to support SQL auth at present.)

When the workspace is created, a "managed" resource group is created along with a user assigned managed identity, visible under the resources of this managed resource group. This user assigned identity, dbmanagedidentity is assigned to the VMs which are provisioned when starting a cluster. You can give this identity access to your SQL database in the usual way1.

If your tenant has multiple dbmanagedidentity users, then you'll additionally need to use the WITH OBJECT_ID clause2 to differentiate it (look up the Object ID from the managed identity resource).

1. Create "contained" database user

CREATE USER [dbmanagedidentity] FROM EXTERNAL PROVIDER WITH OBJECT_ID = '[object-id-guid-for-dbmanagedidentity]'

2. Give it the permissions it needs

ALTER ROLE db_datareader ADD MEMBER [dbmanagedidentity];

3. Connect via python notebook

jdbc_url = "jdbc:sqlserver://sql-example.database.windows.net;database=sqldb-example;Authentication=ActiveDirectoryMSI;"
tbl = (spark.read
  .format("jdbc")
  .option("url", jdbc_url)
  .option("dbtable", "turtles")
  .load()
)

Helpful SO answer giving further insight into FROM EXTERNAL PROVIDER: https://stackoverflow.com/a/76996864/1265167


1 https://learn.microsoft.com/en-us/azure/azure-sql/database/authentication-aad-configure?view=azuresql&tabs=azure-powershell#create-contained-users-mapped-to-microsoft-entra-identities

2 https://learn.microsoft.com/en-us/sql/t-sql/statements/create-user-transact-sql?view=sql-server-ver16#with-object_id--objectid

Upvotes: 3

Carlos Ataíde
Carlos Ataíde

Reputation: 46

Those environment variables are there for added security and flexibility. When using a Managed Identity, you will have to request the access token to different providers, but if you want to simplify things, you can just hard code it like the sample below.

The IP address should never change, and if you're accessing other services besides SQL Database, just replace the "https://database.windows.net/" at the end. This is what the IDENTITY_ENDPOINT env variable is meant to be.

Once you have the access token, you're ready to access the SQL Server.

$response = Invoke-WebRequest -Uri 'http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https://database.windows.net/' -Method GET -Headers @{Metadata="true"}
$content = $response.Content | ConvertFrom-Json
$content.access_token

Upvotes: 0

Related Questions