Hagai Attias
Hagai Attias

Reputation: 3

Using distcp to copy to Azure ADLS Gen1 fails with 403

I am trying to copy to Azure Data Lake Storage (ADLS) Gen1, while authenticating using OAuth2.

I am getting the following error:

com.microsoft.azure.datalake.store.ADLException: Error getting info for file /myContainer Operation GETFILESTATUS failed with HTTP403 : null

Here's how my distcp looks like

hadoop distcp 
    -Dfs.adl.oauth2.access.token.provider.type=ClientCredential 
    -Dfs.adl.oauth2.client.id=<client_id>
    -Dfs.adl.oauth2.credential=<key>
    -Dfs.adl.oauth2.refresh.url=https://login.microsoftonline.com/*****/oauth2/token 
hdfs:///path/to/file 
adl://adlsgen1.blob.core.windows.net/myContainer

Any idea what could cause this?

Upvotes: 0

Views: 788

Answers (1)

Jim Xu
Jim Xu

Reputation: 23141

If you want to use Azure AD to access Azure data lake gen2 with a service principal, we need to configure RABC role for the service principal.

For example

  1. Create a service principal and assign Storage Blob Data Owner to the sp.(I use Azure CLI).
az ad sp create-for-rbac -n "MyApp" --role "Storage Blob Data Owner"\
    --scopes /subscriptions/{SubID}/resourceGroups/{ResourceGroup1} \
    /subscriptions/{SubID}/resourceGroups/{ResourceGroup2}
  1. configure
hadoop distcp 
    -D fs.azure.account.auth.type=OAuth 
    -D fs.azure.account.oauth.provider.type=org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider
    -D fs.azure.account.oauth2.client.endpoint=[token endpoint],
    -D fs.azure.account.oauth2.client.id=[Application client ID],
    -D fs.azure.account.oauth2.client.secret=[client secret]
hdfs:///path/to/file 
abfs://[email protected]/

For more details, please refer to the document and the document

Upvotes: 1

Related Questions