Compressing a stream from Azure Blob (Python SDK)

Question

Can I compress data from Azure Blob to gzip as I download it? I would like to avoid having all data in memory if possible.

I tried two different approaches (compress_chunk and compress_blob) functions. I am not sure if the entire blob was in memory though before compression, or if I can compress it as it is read in somehow.

def compress_chunk(data):
    data.seek(0)
    compressed_body = io.BytesIO()
    compressor = gzip.open(compressed_body, mode='wb')
    while True:
        chunk = data.read(1024 * 1024 * 4)
        if not chunk:
            break
        compressor.write(chunk)
    compressor.flush()
    compressor.close()
    compressed_body.seek(0, 0)
    return compressed_body

def compress_blob(data):
    compressed_body = gzip.compress(data.getvalue())
    return compressed_body

def process_download(container_name, blob):
    with io.BytesIO() as input_io:
        blob_service.get_blob_to_stream(container_name=container_name, blob_name=blob.name, stream=input_io)
        compressed_body = compress_chunk(data=input_io)

Compressing a stream from Azure Blob (Python SDK)

Answers (1)

Related Questions