Vincent Chalmel
Vincent Chalmel

Reputation: 652

Solve timeout errors on file uploads with new azure.storage.blob package

I had to upgrade a docker container that was using the older version of microsoft azure's python packages to download data from an api, then upload a json to Azure Blob Storage. So since the pip install of the former "azure" metapackage is no longer allowed I have to use the new standalone packages (azure-storage-blob==12.6.0).

Switching from the function "create_blob_from_path" from the blockblobservice integrated in the old "azure" package, to the new standalone package and BlobClient.upload() fails on larger files with a timeout error that completely ignores the timeout parameter of the function.

I get a ServiceResponseError with the msg "Connection aborted / The write operation timed out"

Is there any way to solve that error ?

The new function feels like a huge step backwards from create_blob_from_path, the absence of progress_callback mainly is deplorable...

Upvotes: 4

Views: 22465

Answers (4)

Joel Brandt
Joel Brandt

Reputation: 176

The correct solution, if your control flow allows it, seems to be setting the max_single_put_size to something smaller (like 4MB) when you create the BlobClient. You can do this with a keyword parameter when calling the constructor.

However, as near as I can tell, this parameter cannot be configured if creating a BlobClient through the BlobClient.from_blob_url control flow. The default value for this is 64MB, and it is easy to hit the default connection timeout before a 64MB PUT is done. In some applications, you may not have access to auth credentials for the storage account (i.e. you're just using a signed URL), so the only way to create a BlobClient is from a BlobClient.from_blob_url call.

It seems like the workaround is to set the poorly documented connection_timeout parameter on the upload_blob call, instead of the timeout parameter. So, something like:

upload_result = block_blob_client.upload_blob(
    data,
    blob_type="BlockBlob",
    content_settings=content_settings,
    length=file_size,
    connection_timeout=600,
)

That parameter is documented on this page:

https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/storage/azure-storage-blob#other-client--per-operation-configuration

However, it is not currently documented on the official BlobClient documentation:

https://learn.microsoft.com/en-us/python/api/azure-storage-blob/azure.storage.blob.blobclient?view=azure-python

I've filed this documentation bug: https://github.com/Azure/azure-sdk-for-python/issues/22936

Upvotes: 9

junkchaser
junkchaser

Reputation: 1

This worked for me:

blob_service_client = BlobServiceClient.from_connection_string(connect_str)
blob_service_client.max_single_put_size = 4*1024*1024
blob_service_client.timeout = 180
container_client = blob_service_client.get_container_client(container_name) 
container_client.upload_blob(data=file, name=key, max_concurrency=12)

Also check https://learn.microsoft.com/en-us/python/api/azure-storage-blob/azure.storage.blob.blobclient?view=azure-python

Upvotes: 0

Ivan Glasenberg
Ivan Glasenberg

Reputation: 29985

Not sure how do you set the timeout value, here is an example of upload blob with timeout setting:

with open(upload_file_path,"rb") as data:
    blob_client.upload_blob(data=data,timeout=600) # timeout is set to 600 seconds

If the timeout is ignored, another workaround is that you can upload blob in chunk, code like below:

# upload data in chunk
block_list=[]
chunk_size=1024
with open(upload_file_path,'rb') as f:
   
   while True:
        read_data = f.read(chunk_size)
        if not read_data:
            break # done
        blk_id = str(uuid.uuid4())
        blob_client.stage_block(block_id=blk_id,data=read_data) 
        block_list.append(BlobBlock(block_id=blk_id))
        

blob_client.commit_block_list(block_list)

Upvotes: 0

unknown
unknown

Reputation: 7483

I tested with code as following, it uploaded the file(~10M) successfully.

blob_service_client = BlobServiceClient.from_connection_string(connect_str)
# Create a blob client using the local file name as the name for the blob
blob_client = blob_service_client.get_blob_client(container=container_name, blob=local_file_name)

# Upload content to block blob
with open(SOURCE_FILE, "rb") as data:
    blob_client.upload_blob(data, blob_type="BlockBlob")

enter image description here

Upvotes: 0

Related Questions