Reputation: 61
With very basic code that simply loops through my storage account and mirrors all containers and blobs to my local disk, I'm finding the Get-AsureStorageBlobContent cmdlet to be incredibly slow? It seems to take a real time second or two per blob regardless of the blob size...which adds considerable overhead when we've got thousands of tiny files.
In contrast, on the same machine and network connection (even running simultaneously), Azure Explorer does the same bulk copy 10x to 20x faster, and AzCopy does it literally 100x faster (async), so clearly it's not a network issue.
Is there a more efficient way to use the Azure storage cmdlets, or are they just dog slow by nature? The help for Get-AzureStorageContainer mentions a -ConcurrentTaskCount option which implies some ability to be async, but there's no documentation on how to achieve async and given that it only operates on a single item I'm not sure how it could?
This is the code I'm running:
$localContent = "C:\local_copy"
$storageAccountName = "myblobaccount"
$storageAccountKey = "mykey"
Import-Module Azure
$blob_account = New-AzureStorageContext -StorageAccountName $storageAccountName -StorageAccountKey $storageAccountKey -Protocol https
Get-AzureStorageContainer -Context $blob_account | ForEach-Object {
$container = $_.Name
Get-AzureStorageBlob -Container $container -Context $blob_account | ForEach-Object {
$local_path = "$localContent\{0}\{1}" -f$container,$_.Name
$local_dir = Split-Path $local_path
if (!(Test-Path $local_dir)) {
New-Item -Path $local_dir -ItemType directory -Force
}
Get-AzureStorageBlobContent -Context $blob_account -Container $container -Blob $_.Name -Destination $local_path -Force | Out-Null
}
}
Upvotes: 3
Views: 5018
Reputation: 136146
I looked at the source code for Get-AzureStorageBlobContent
on Github and found certain interesting things which may cause the slowness of downloading blobs (especially smaller sized blobs):
Line 165:
ICloudBlob blob = Channel.GetBlobReferenceFromServer(container, blobName, accessCondition, requestOptions, OperationContext);
What this code does is that it makes a request to the server to fetch blob type. So you add one extra request to the server for each blob.
Line 252 - 262:
try
{
DownloadBlob(blob, filePath);
Channel.FetchBlobAttributes(blob, accessCondition, requestOptions, OperationContext);
}
catch (Exception e)
{
WriteDebugLog(String.Format(Resources.DownloadBlobFailed, blob.Name, blob.Container.Name, filePath, e.Message));
throw;
}
If you look at the code above, it first downloads the blob DownloadBlob
and the tries to fetch blob attributes Channel.FetchBlobAttributes
. I haven't looked at the source code for Channel.FetchBlobAttributes
function but I suspect it is making one more request to the server.
So to download a single blob, essentially the code is making 3 requests to the server which could be the reason for slowness. To be certain, you could trace your requests/response through Fiddler and see how exactly the cmdlet is interacting with storage.
Upvotes: 2
Reputation: 15865
Check out Blob Transfer Utility. It uses the Azure api, and its a good bet that is what Azure Explorer is using as well. BTU is open source so it would be much easier to test if its the cmdlet that is the problem.
Upvotes: 0