Reputation: 652
I had to upgrade a docker container that was using the older version of microsoft azure's python packages to download data from an api, then upload a json to Azure Blob Storage. So since the pip install of the former "azure" metapackage is no longer allowed I have to use the new standalone packages (azure-storage-blob==12.6.0).
Switching from the function "create_blob_from_path" from the blockblobservice integrated in the old "azure" package, to the new standalone package and BlobClient.upload() fails on larger files with a timeout error that completely ignores the timeout parameter of the function.
I get a ServiceResponseError with the msg "Connection aborted / The write operation timed out"
Is there any way to solve that error ?
The new function feels like a huge step backwards from create_blob_from_path, the absence of progress_callback mainly is deplorable...
Upvotes: 4
Views: 22465
Reputation: 176
The correct solution, if your control flow allows it, seems to be setting the max_single_put_size
to something smaller (like 4MB) when you create the BlobClient
. You can do this with a keyword parameter when calling the constructor.
However, as near as I can tell, this parameter cannot be configured if creating a BlobClient through the BlobClient.from_blob_url
control flow. The default value for this is 64MB, and it is easy to hit the default connection timeout before a 64MB PUT is done. In some applications, you may not have access to auth credentials for the storage account (i.e. you're just using a signed URL), so the only way to create a BlobClient
is from a BlobClient.from_blob_url
call.
It seems like the workaround is to set the poorly documented connection_timeout
parameter on the upload_blob
call, instead of the timeout
parameter. So, something like:
upload_result = block_blob_client.upload_blob(
data,
blob_type="BlockBlob",
content_settings=content_settings,
length=file_size,
connection_timeout=600,
)
That parameter is documented on this page:
However, it is not currently documented on the official BlobClient
documentation:
I've filed this documentation bug: https://github.com/Azure/azure-sdk-for-python/issues/22936
Upvotes: 9
Reputation: 1
This worked for me:
blob_service_client = BlobServiceClient.from_connection_string(connect_str)
blob_service_client.max_single_put_size = 4*1024*1024
blob_service_client.timeout = 180
container_client = blob_service_client.get_container_client(container_name)
container_client.upload_blob(data=file, name=key, max_concurrency=12)
Upvotes: 0
Reputation: 29985
Not sure how do you set the timeout value, here is an example of upload blob with timeout setting:
with open(upload_file_path,"rb") as data:
blob_client.upload_blob(data=data,timeout=600) # timeout is set to 600 seconds
If the timeout is ignored, another workaround is that you can upload blob in chunk, code like below:
# upload data in chunk
block_list=[]
chunk_size=1024
with open(upload_file_path,'rb') as f:
while True:
read_data = f.read(chunk_size)
if not read_data:
break # done
blk_id = str(uuid.uuid4())
blob_client.stage_block(block_id=blk_id,data=read_data)
block_list.append(BlobBlock(block_id=blk_id))
blob_client.commit_block_list(block_list)
Upvotes: 0
Reputation: 7483
I tested with code as following, it uploaded the file(~10M) successfully.
blob_service_client = BlobServiceClient.from_connection_string(connect_str)
# Create a blob client using the local file name as the name for the blob
blob_client = blob_service_client.get_blob_client(container=container_name, blob=local_file_name)
# Upload content to block blob
with open(SOURCE_FILE, "rb") as data:
blob_client.upload_blob(data, blob_type="BlockBlob")
Upvotes: 0