DataSteve
DataSteve

Reputation: 61

Attempting to Read parcquet files on linked storage in Azure Synapse

I am attempting to give access to parquet files on a Gen2 Data Lake container. I have owner RBAC on the container but would prefer to limit access in the container for other users.

My Query is very simple:

SELECT
  TOP 100 *
FROM
  OPENROWSET(
    BULK 'https://aztsworddataaipocacldl.dfs.core.windows.net/pocacl/Top/Sub/part-00006-c62926ba-c530-4ad8-87d1-cf38c67a2da3-c000.snappy.parquet',
    FORMAT='PARQUET'
  ) AS [result]

When I run this I have no problems connecting. I have attempted to add ACL rights onto the files (and of course the containing folders 'Top' and 'Sub').

I've give RWX on the 'Top' folder using Storage Explorer and default so that it cascades to the 'Sub' folder and parquet files as I add them

When my colleague attempts to run the SQL script the get the error message. Failed to execute query. Error: File 'https://aztsworddataaipocacldl.dfs.core.windows.net/pocacl/Top/Sub/part-00006-c62926ba-c530-4ad8-87d1-cf38c67a2da3-c000.snappy.parquet' cannot be opened because it does not exist or it is used by another process. NB similar results are also experienced in Spark but with a 403 instead

SQL on-demand provides a link to the following help file after the error, it suggests:

If your query fails with the error saying 'File cannot be opened because it does not exist or it is used by another process' and you're sure both file exist and it's not used by another process it means SQL on-demand can't access the file. This problem usually happens because your Azure Active Directory identity doesn't have rights to access the file. By default, SQL on-demand is trying to access the file using your Azure Active Directory identity. To resolve this issue, you need to have proper rights to access the file. Easiest way is to grant yourself 'Storage Blob Data Contributor' role on the storage account you're trying to query.

I don't wish to grant Storage Blob Data Contributor or Storage Blob Data Reader as this gives access to every file on the container and not just those I want end users to be able to query. We have found the same experience occurs for SSMS connecting to parquet external tables.

So then in parts:

  1. Is this the correct pattern using ACL to grant access, or should I use another method?
  2. Are there settings on the Storage Account or within my query/notebook that I should be enabling to support ACL?*
  3. Has ACL been implemented on Synapse Workspace to date given that we're still in preview?

*I have resisted pasting my entire settings as I really have no idea what is relevant and what entirely irrelevant to this issue but of course can supply.

Upvotes: 2

Views: 4844

Answers (3)

Nehal Alex
Nehal Alex

Reputation: 1

Go to azure storage explorer change ACL permissions in the route Folder and right click on your storage and click on "propogate access control lists"

Upvotes: 0

DataSteve
DataSteve

Reputation: 61

It would appear that the ACL feature was not working correctly in Preview for Azure Synapse Analytics.

I have now managed to get it to work. At present I see that once Read|Execute is provided to a folder it allows access to the files contained within that folder and sub folders. Access is available even when no specific ACL access is provided on a file in a sub folder. This is not quite what I expected however it provides enough for me to proceed: only giving access to the Gold folder allows for separation of access to the files I want to let users query and the working files that I want to keep hidden.

Upvotes: 2

Jovan MSFT
Jovan MSFT

Reputation: 14610

When you assign ACL to a folder it's not propagated recursively to all files inside the folder. Only new files inherit from the folder.

You can see this here

Upvotes: 2

Related Questions