TrayMan
TrayMan

Reputation: 7445

How to copy Azure blobs quickly in Python

I need to copy about 1000 blobs at a time from one storage account to another. The size of each blob is roughly between 100 to 1000MB. Each blob is renamed, so I cannot copy the blobs in bulk using a common prefix.

The approach I've taken is to use BlobClient.start_copy_from_url() to create an asynchronous copy operation for each blob and wait for them to complete. The problem is that it takes hours to copy the blobs this way. The operations seem to complete in batches of around 6 operations at a time, which makes me think there's something that prevents more from being processed parallel.

In comparison, it takes about 5 minutes for Storage Explorer to copy the same blobs between the storage accounts.

How does Storage Explorer copy files so quickly and is there a way to make my Python script copy blobs faster?

My code is essentially similar this:

active_jobs=[]
for job in pending_jobs: # 1000 pending jobs
     job.target=job.target_client.get_blob_client(job.target_path)           
     source=job.source_client.get_blob_client(job.source_path)
     job.target.start_copy_from_url(source.url)
     active_jobs.append(job)

while active_jobs:
 for job in active_jobs:
   status=job.target.get_blob_properties().copy.status
   if status=="success":
      job.done=True
      print("Job done")
 active_jobs=[job for job in active_jobs if not job.done]

Upvotes: 0

Views: 937

Answers (2)

Danferno
Danferno

Reputation: 555

How does Storage Explorer copy files so quickly and is there a way to make my Python script copy blobs faster?

Have you checked your network connection during the copy operation? If I'm not mistaken, azcopy copies directly without passing the executing computer, whereas the Python SDK always downloads the blob locally and the uploads it again.

Upvotes: 0

Venkatesan
Venkatesan

Reputation: 10302

Is there a way to make my Python script copy blobs faster?

As of now compared to the client library SDK, you can use the azcopy tool in the Python program to copy faster from source to destination storage account.

You can download the AzCopy tool from this MS-Document. In my environment, I have some files in the source container with (200MB) size in my blob storage.

Portal:

enter image description here

You can use the below Python code to copy from one storage account to another storage account with azcopy tool.

Code:

import subprocess
import time

def copy_blob_with_azcopy(source_account_url, source_sas_token, destination_account_url, destination_sas_token, source_container, source_directory, destination_container):
    source_blob_url = f"{source_account_url}/{source_container}/{source_directory}{source_sas_token}"
    destination_blob_url = f"{destination_account_url}/{destination_container}/{destination_sas_token}"
    
    start_time = time.time()
    azcopy_command = f'azcopy copy "{source_blob_url}" "{destination_blob_url}" --recursive=true'
    subprocess.run(azcopy_command, shell=True)
    end_time = time.time()

    elapsed_time = end_time - start_time

    print(f"Blob copied successfully. Elapsed time: {elapsed_time} seconds")


source_account_url = "https://<storage account name>.blob.core.windows.net"
source_sas_token = "<Your-SAS-token>"
destination_account_url = "https://<storage account name>.blob.core.windows.net"
destination_sas_token = ""<Your-SAS-token>""
source_container = "test"
source_directory = "directory1"
destination_container = "test"

copy_blob_with_azcopy(source_account_url, source_sas_token, destination_account_url, destination_sas_token, source_container, source_directory, destination_container)

Output:

INFO: Scanning...
INFO: azcopy: A newer version 10.21.2 is available to download

INFO: Any empty folders will not be processed, because source and/or destination doesn't have full folder support

Job d9cee83f-666xxxxa5dea has started
Log file is located at: C:xxxxf-666c-9b45-425a-f2cabf7a5dea.log

100.0 %, 3 Done, 0 Failed, 0 Pending, 0 Skipped, 3 Total,


Job d9cee83f-666c-xxxa summary
Elapsed Time (Minutes): 0.0669
Number of File Transfers: 3
Number of Folder Property Transfers: 0
Total Number of Transfers: 3
Number of Transfers Completed: 3
Number of Transfers Failed: 0
Number of Transfers Skipped: 0
TotalBytesTransferred: 603272234
Final Job Status: Completed

Blob copied successfully. Elapsed time: 7.026703834533691 seconds

enter image description here

Portal:

enter image description here

Upvotes: 2

Related Questions