Ashish Gupta
Ashish Gupta

Reputation: 2614

Efficiently copy data from Azure blob storage to S3

I want to move ~1 million files from Azure storage to S3. I wrote this python script using Azure python sdk and boto3.

marker = None
while True:
    batch = azure_blob_service.list_blobs(
        container, marker=marker)
    # copy blobs in batches
    for blob in batch:
        blob_name = blob.name
        current_blob = azure_blob_service.get_blob_to_bytes(
            copy_from_container, blob_name)
        s3_client.put_object(
            Body=current_blob.content,
            Bucket=s3_bucket,
            ContentType=current_blob.properties.content_settings.content_type,
            Key=blob_name)
    if not batch.next_marker:
        break
    marker = batch.next_marker

But this is slow.

How can I move data efficiently from azure to S3?

Upvotes: 2

Views: 4489

Answers (1)

Gaurav Mantri
Gaurav Mantri

Reputation: 136156

Considering S3 does not support server-side async copy blob like Azure Blob Storage, in order to move data from Azure Storage to S3, you would need to first download the blobs from Azure Storage and then upload them back to S3. This is where Internet speed comes into play as you are downloading and uploading lots of data.

If you want to speed up the whole process one alternative would be to run this script in a VM in Amazon itself (ideally in the same region as your S3 account). That way you will be able to download much faster (assuming Amazon offers better Internet speeds than what you currently have) and then upload it faster as well as you will be uploading in the same region.

Upvotes: 2

Related Questions