Into Numbers
Into Numbers

Reputation: 973

Azure Function Python write to Azure DataLake Gen2

I want to write a file to my Azure DataLake Gen2 with an Azure Function and Python.

Unfortunately I'm having the following authentication issue:

Exception: ClientAuthenticationError: (InvalidAuthenticationInfo) Server failed to authenticate the request. Please refer to the information in the www-authenticate header.

'WWW-Authenticate': 'REDACTED'

Both my account and the Function app should have the necessary roles for accessing my DataLake assigned.

And here is my function:

import datetime
import logging

from azure.identity import DefaultAzureCredential
from azure.storage.filedatalake import DataLakeServiceClient
import azure.functions as func

def main(mytimer: func.TimerRequest) -> None:
    utc_timestamp = datetime.datetime.utcnow().replace(
        tzinfo=datetime.timezone.utc).isoformat()

    if mytimer.past_due:
        logging.info('The timer is past due!')

    credential = DefaultAzureCredential()
    service_client = DataLakeServiceClient(account_url="https://<datalake_name>.dfs.core.windows.net", credential=credential)

    file_system_client = service_client.get_file_system_client(file_system="temp")
    directory_client = file_system_client.get_directory_client("test")
    file_client = directory_client.create_file("uploaded-file.txt")
    
    file_contents = 'some data'
    file_client.append_data(data=file_contents, offset=0, length=len(file_contents))
    file_client.flush_data(len(file_contents))


    logging.info('Python timer trigger function ran at %s', utc_timestamp)

What am I missing?

THX & BR

Peter

Upvotes: 2

Views: 2510

Answers (2)

Guido van Steen
Guido van Steen

Reputation: 534

The function suggested by Bowman Zhu contains an error. According to the Azure documentation the parameter "length" expects length in bytes. However, the suggested function uses length in characters. Some of these characters may consist of multiple bytes. In such cases the function will not write all bytes of file_contents to the file, and thus cause data loss!

Therefore,

file_client.append_data(data=file_contents, offset=0, length=len(file_contents))
file_client.flush_data(len(file_contents))

must be something like:

length = len(file_contents.encode())
file_client.append_data(data=file_contents, offset=0, length=length)
file_client.flush_data(offset=length)

Upvotes: 0

suziki
suziki

Reputation: 14113

The problem seems come from the DefaultAzureCredential.

The identity of DefaultAzureCredential uses depends on the environment. When an access token is needed, it requests one using these identities in turn, stopping when one provides a token:

1. A service principal configured by environment variables. 
2. An Azure managed identity. 
3. On Windows only: a user who has signed in with a Microsoft application, such as Visual Studio.
4. The user currently signed in to Visual Studio Code.
5. The identity currently logged in to the Azure CLI.

In fact, you can completely generate datalake service objects without using the default credentials. You can do this (connect directly using the connection string):

import logging
import datetime

from azure.storage.filedatalake import DataLakeServiceClient
import azure.functions as func


def main(req: func.HttpRequest) -> func.HttpResponse:
    connect_str = "DefaultEndpointsProtocol=https;AccountName=0730bowmanwindow;AccountKey=xxxxxx;EndpointSuffix=core.windows.net"
    utc_timestamp = datetime.datetime.utcnow().replace(
        tzinfo=datetime.timezone.utc).isoformat()

    service_client = DataLakeServiceClient.from_connection_string(connect_str)

    file_system_client = service_client.get_file_system_client(file_system="test")
    directory_client = file_system_client.get_directory_client("test")
    file_client = directory_client.create_file("uploaded-file.txt")
    
    file_contents = 'some data'
    file_client.append_data(data=file_contents, offset=0, length=len(file_contents))
    file_client.flush_data(len(file_contents))

    return func.HttpResponse(
            "Test.",
            status_code=200
    )

In addition, in order to ensure smooth data writing, please check whether your datalake has access restrictions.

Upvotes: 1

Related Questions