Hubert
Hubert

Reputation: 131

Databricks file trigger - how to whitlelist storage firewall

Recently, Databricks added a new feature - file trigger. However, this functionality seems to need a storage account to allow all network traffic.

My storage account has a firewall configured, it denies traffic from unknown sources. Databricks Workspace is deployed to our internal network - we are using Vnet injection. All necessary subnets are whitelisted, generally, storage works fine, but not with a file trigger. If I turn off the storage firewall, the file trigger works fine. External location and Azure Databricks Connector are configured correctly.

The error I get:

Invalid credentials for storage location abfss://@.dfs.core.windows.net/. The credentials for the external location in the Unity Catalog cannot be used to read the files from the configured path. Please grant the required permissions.

If I look at the logs in my storage account - it looks like the file trigger lists the storage account from a private IP address starting from 10.120.x.x. How do I whitelist this service? I want to keep my storage under the firewall.

Upvotes: 2

Views: 874

Answers (2)

Martinilol
Martinilol

Reputation: 11

According to Databricks Monthly Customer Newsletter (Feb 2024) and this issue is fixed in GA.

File arrival triggers is now generally available in all cloud providers. With this release, you can use file arrival triggers to run a Azure Databricks job when new files arrive in a Unity Catalog volume in addition to the existing support for Unity Catalog external locations. See Trigger jobs when new files arrive.

This release also removes a limitation with using file arrival triggers with an Azure firewall.

Upvotes: 1

Alex Ott
Alex Ott

Reputation: 87259

Update 3rd April 2023rd: ADLS firewall isn't supported right now out of the box, work is in progress to solve this issue.

It's described in the documentation - you need:

  • Create managed identity by creating the Databricks Access Connector
  • Give this managed identity permission to access your storage account
  • Create UC external location using the managed identity
  • Give access to your storage account to given access connector - in "Networking", select "Resource instances", then select a Resource type of Microsoft.Databricks/accessConnectors and select your Azure Databricks access connector.

Upvotes: 2

Related Questions