Reputation: 7445
I need to copy about 1000 blobs at a time from one storage account to another. The size of each blob is roughly between 100 to 1000MB. Each blob is renamed, so I cannot copy the blobs in bulk using a common prefix.
The approach I've taken is to use BlobClient.start_copy_from_url()
to create an asynchronous copy operation for each blob and wait for them to complete. The problem is that it takes hours to copy the blobs this way. The operations seem to complete in batches of around 6 operations at a time, which makes me think there's something that prevents more from being processed parallel.
In comparison, it takes about 5 minutes for Storage Explorer to copy the same blobs between the storage accounts.
How does Storage Explorer copy files so quickly and is there a way to make my Python script copy blobs faster?
My code is essentially similar this:
active_jobs=[]
for job in pending_jobs: # 1000 pending jobs
job.target=job.target_client.get_blob_client(job.target_path)
source=job.source_client.get_blob_client(job.source_path)
job.target.start_copy_from_url(source.url)
active_jobs.append(job)
while active_jobs:
for job in active_jobs:
status=job.target.get_blob_properties().copy.status
if status=="success":
job.done=True
print("Job done")
active_jobs=[job for job in active_jobs if not job.done]
Upvotes: 0
Views: 937
Reputation: 555
How does Storage Explorer copy files so quickly and is there a way to make my Python script copy blobs faster?
Have you checked your network connection during the copy operation? If I'm not mistaken, azcopy copies directly without passing the executing computer, whereas the Python SDK always downloads the blob locally and the uploads it again.
Upvotes: 0
Reputation: 10302
Is there a way to make my Python script copy blobs faster?
As of now compared to the client library SDK, you can use the azcopy tool
in the Python program to copy faster from source to destination storage account.
You can download the AzCopy tool
from this MS-Document.
In my environment, I have some files in the source container with (200MB) size in my blob storage.
Portal:
You can use the below Python code to copy from one storage account to another storage account with azcopy tool.
Code:
import subprocess
import time
def copy_blob_with_azcopy(source_account_url, source_sas_token, destination_account_url, destination_sas_token, source_container, source_directory, destination_container):
source_blob_url = f"{source_account_url}/{source_container}/{source_directory}{source_sas_token}"
destination_blob_url = f"{destination_account_url}/{destination_container}/{destination_sas_token}"
start_time = time.time()
azcopy_command = f'azcopy copy "{source_blob_url}" "{destination_blob_url}" --recursive=true'
subprocess.run(azcopy_command, shell=True)
end_time = time.time()
elapsed_time = end_time - start_time
print(f"Blob copied successfully. Elapsed time: {elapsed_time} seconds")
source_account_url = "https://<storage account name>.blob.core.windows.net"
source_sas_token = "<Your-SAS-token>"
destination_account_url = "https://<storage account name>.blob.core.windows.net"
destination_sas_token = ""<Your-SAS-token>""
source_container = "test"
source_directory = "directory1"
destination_container = "test"
copy_blob_with_azcopy(source_account_url, source_sas_token, destination_account_url, destination_sas_token, source_container, source_directory, destination_container)
Output:
INFO: Scanning...
INFO: azcopy: A newer version 10.21.2 is available to download
INFO: Any empty folders will not be processed, because source and/or destination doesn't have full folder support
Job d9cee83f-666xxxxa5dea has started
Log file is located at: C:xxxxf-666c-9b45-425a-f2cabf7a5dea.log
100.0 %, 3 Done, 0 Failed, 0 Pending, 0 Skipped, 3 Total,
Job d9cee83f-666c-xxxa summary
Elapsed Time (Minutes): 0.0669
Number of File Transfers: 3
Number of Folder Property Transfers: 0
Total Number of Transfers: 3
Number of Transfers Completed: 3
Number of Transfers Failed: 0
Number of Transfers Skipped: 0
TotalBytesTransferred: 603272234
Final Job Status: Completed
Blob copied successfully. Elapsed time: 7.026703834533691 seconds
Portal:
Upvotes: 2