Tim H.
Tim H.

Reputation: 11

How do I add Azure Synapse IP address range to ADLS firewall?

I would like a Synapse notebook read ADLS blob data outside of the managed VNet, but I am getting 403 errors. (for both Managed Identity/UPN's)

java.nio.file.AccessDeniedException: Operation failed: "This request is not authorized to perform this operation.", 403, HEAD ...

at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.checkException(AzureBlobFileSystem.java:1200) at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.getFileStatus(AzureBlobFileSystem.java:519) at org.apache.hadoop.fs.FileSystem.isDirectory(FileSystem.java:1713) at org.apache.spark.sql.execution.streaming.FileStreamSink$.hasMetadata(FileStreamSink.scala:47) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:377) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:332) at org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:315)

The ADLS storage account has been configured to use a firewall. A third-party vendor needs a vanilla ADLS storage with no private endpoints to land HR data. It is a product. We do not want to provide anonymous access.

Current configurations:

  1. IAM: Synapse managed service identity and developers have been granted Storage Blob Data Contributor roles.
  2. ADLS config: Resource instances that have access to the storage account: 'Microsoft.Synapse/workspace'
  3. Checked: Allow Azure services on the trusted services list to access this storage account.

ADLS logs show, "Azure Synapse Analytics ... blocked" Client IP Address: XXX.XXX.XXX.XXX

The Synapse managed VNet is not accessible and therefore I cannot grab the IP address. I can query the parquet files in the Synapse workspace using the Linked Services.

How can I run a Synapse notebook and query the Workday ADLS storage area?

filename =  'fact_payroll_timecard/*.parquet'
data_path = 'abfss://%s@%s.dfs.core.windows.net/%s/%s' % (raw_container_name, raw_account_name, rawpath,filename)
dfpt= spark.read.parquet(data_path) 

Turning off IP address filtering briefly successfully returned data.

Adding the CallerIpAddress found in the Logs to the Network Firewall IP Address list worked as well. To get the IP address of the calling notebook, turn on Diagnostic Settings for the blob storage and run this query.

StorageBlobLogs
| where TimeGenerated  > ago(3d)

I don't think this is a long-term solution as the Caller IP address will change.

Upvotes: 0

Views: 287

Answers (1)

When the notebook is executed via the pipeline, the workspace managed service identity (MSI) is utilized.

Step 1: Ensure the workspace MSI has the necessary permissions to access the storage account data. The simplest way to achieve this is by assigning the workspace MSI to the Storage Blob Data Contributor role on the storage account.

enter image description here

Step 2: If the firewall is enabled on the storage account, follow these instructions: Configure Azure Storage firewalls and virtual networks

Here is an example where the firewall is enabled on the storage account:

enter image description here

When you grant access to trusted Azure services within the storage networking settings, you provide the following types of access:

  • Trusted access for select operations to resources registered in your subscription.
  • Trusted access to resources using system-assigned managed identities.

Learn more about Connect to a secure storage account from your Azure Synapse workspace – Azure Synapse Analytics

Step 3: Configure the Linked Service

Open Synapse Studio and set up the Linked Service to use the workspace MSI:

enter image description here

Step 4: Update the Notebook Code to Utilize the Linked Service Configuration

val linked_service_name = "LinkedServerName" 
// replace with your linked service name
%%spark
// Allow SPARK to access from Blob remotely
val sc = spark.sparkContext
spark.conf.set("spark.storage.synapse.linkedServiceName", linked_service_name)
spark.conf.set("fs.azure.account.oauth.provider.type", "com.microsoft.azure.synapse.tokenlibrary.LinkedServiceBasedTokenProvider") 
//replace the container and storage account names
val df = "abfss://[email protected]/"
print("Remote blob path: " + df)
mssparkutils.fs.ls(df)

Reference: Using the workspace MSI to authenticate a Synapse notebook when accessing an Azure Storage account

Upvotes: 0

Related Questions