Reputation: 2408
I am trying to upload a blob to azure blob storage with python sdk. I want to pass the MD5 hash for validation on the server side after upload.
Here's the code:
blob_service.put_block_blob_from_path(
container_name='container_name',
blob_name='upload_dir/'+object_name,
file_path=object_name,
content_md5=object_md5Hash
)
But I get this error:
AzureHttpError: The MD5 value specified in the request did not match with the MD5 value calculated by the server.
The file is ~200mb and the error throws instantly. Does not upload the file. So I suspect that it may be comparing the supplied hash with perhaps the hash of the first chunk or something.
Any ideas?
Upvotes: 0
Views: 2736
Reputation: 2457
This is sort of an SDK bug in that we should throw a better error message rather than hitting the service, but validating the content of a large upload that has to be chunked simply doesn't work. x_ms_blob_content_md5 will store the md5 but the service will not validate it. That is something you could do on download though. content_md5 is validated by the server for the body of a particular request but since there's more than one with chunked blobs it will never work.
So, if the blob is small enough (below BLOB_MAX_DATA_SIZE) to be put in a single request, content_md5 will work fine. Otherwise I'd simply recommend using HTTPS and storing MD5 in x_ms_blob_content_md5 if you think you might want to download with HTTP and validate it on download. HTTPS already provides validation for things like bit flips on the wire so using it for upload/download will do a lot. If you can't upload/download with HTTPS for one reason or another you can consider chunking the blob yourself using the put block and put block list APIs.
FYI: In future versions we do intend to add automatic MD5 calculation for both single put and chunked operations in the library itself which will fully solve this. For the next version, we will add an improved error message if content_md5 is specified for a chunked download.
Upvotes: 1
Reputation: 136356
I think there're two things going on here.
To fix the issue in the interim, please modify the source code in blobservice.py
and comment out the following lines of code:
self.put_blob(
container_name,
blob_name,
None,
'BlockBlob',
content_encoding,
content_language,
content_md5,
cache_control,
x_ms_blob_content_type,
x_ms_blob_content_encoding,
x_ms_blob_content_language,
x_ms_blob_content_md5,
x_ms_blob_cache_control,
x_ms_meta_name_values,
x_ms_lease_id,
)
I have created a new issue on Github for this: https://github.com/Azure/azure-storage-python/issues/99.
content_md5
parameter. This will not work for you. You should actually pass md5 hash in x_ms_blob_content_md5
parameter. So your call should be:blob_service.put_block_blob_from_path( container_name='container_name', blob_name='upload_dir/'+object_name, file_path=object_name, x_ms_blob_content_md5=object_md5Hash )
Upvotes: 0
Reputation: 24148
I reviewed the source code of the function put_block_blob_from_path
of the Azure Blob Storage SDK. It explained the case in the function comment, please see the content below and refer to https://github.com/Azure/azure-storage-python/blob/master/azure/storage/blob/blobservice.py.
content_md5:
Optional. An MD5 hash of the blob content. This hash is used to verify the integrity of the blob during transport. When this header is specified, the storage service checks the hash that has arrived with the one that was sent. If the two hashes do not match, the operation will fail with error code 400 (Bad Request).
Upvotes: 0