Reputation: 2611
I have a large number of containers in my Azure blob storage account. Each container contains several files/blobs of different formats (e.g., txt or json.) I need to compute the hash of these files and store the hash in a database. The way we're doing it now is that we download the blobs in a container to our local machine, and then compute the hash.
Everything is working, but there's one problem. Since the blobs are pretty large (by average the total size of blobs in a container is ~1GB,) it is so costly for us in terms of bandwidth to every time download the files to only compute the hash. I wonder if there's any way to apply our hash computation method (which is a Python method) to the blobs in a container on Azure without the need to download the blobs?
Upvotes: 1
Views: 109
Reputation: 2818
This is possible if you're satisfied with an MD5
hash. Azure Storage stores an MD5
hash of files as a base64 encoded representation of the binary MD5 hash value. This value is calculated on upload, unless the file is too large and writing is done in blocks.
You can fetch the MD5
hash of files using the Azure Storage SDK
:
from azure.storage.blob import BlobServiceClient
import binascii
content_settings = blob_client.get_blob_properties().content_settings
blobmd5 = bytearray(content_settings.content_md5)
hex = binascii.hexlify(blobmd5).decode('utf-8')
Meanwhile, for very large files, you can avoid the issue of Azure not calculating the MD5 value for the file on upload by calculating the MD5 hash locally before uploading the file to Azure Storage and setting it manually in the upload statement.
blob_service.put_block_blob_from_path(
container_name='container_name',
blob_name=object_name,
file_path=object_name,
x_ms_blob_content_md5=object_md5Hash
)
According to the following GitHub Issue, the current SDK favors using transactional MD5 hashing over storage of the MD5 value, which means that we need to use the x_ms_blob_content_md5
value to store the hash value for large files when doing this manually.
Hope this helps!
Upvotes: 0