Reputation: 93754
I'm trying to list the blob files name from Azure storage. I don't want the contents present in the blob files, just want to list the name.
Here is my current approach.
public static async Task<List<string>> GetBlobList()
{
var cloudBlobContainer = await CreateCloudBlobContainer();
BlobContinuationToken continuationToken = null;
List<string> blobList = new List<string>();
do
{
BlobResultSegment response;
response = await cloudBlobContainer
.ListBlobsSegmentedAsync(null, true, BlobListingDetails.None, 5000, continuationToken, null, null);
continuationToken = response.ContinuationToken;
foreach (CloudBlockBlob cloudBlob in response.Results.OfType<CloudBlockBlob>())
{
blobList.Add(cloudBlob.Name);
}
}
while (continuationToken != null);
return blobList;
}
It works perfectly fine, but to retrieve 11000 blob file names its takes around 10 seconds on an average.
Is there a way to improve it ? I'm not looking for concrete answers here, pointers should be fine.
Upvotes: 2
Views: 2570
Reputation: 24148
I have an idea that may accelerate to list the blob names within multi-threading.
According to the API reference CloudBlobContainer.ListBlobsSegmentedAsync Method
as below, the first parameter prefix
of this method can be used to list these blob names started with the prefix
value, such as abc.txt
started with prefix a
.
So assumption that these blob names in a container are started with a-z
, A-Z
, 0-9
or other valid characters, or these prefix words you known in the container, you can concurrently list these blob names of different prefix word within multi-threading to reduce the time cost for getting chained the next list via ContinuationToken
.
Meanwhile, if you get the responses of multi-threadings by the ordered prefix words, the final merged list is ordered without additional sort operation.
Hope it helps.
Upvotes: 2
Reputation: 59001
I think that is already the fastest implementation to retrieve all blob file names.
However, maybe you can slightly improve the performance by hosting a tiny REST API on e. g. Azure Function that is located in the same location as your Blob Storage. This function then only returns a list of names (ListBlobsSegmentedAsync does return more metadata which increases the payload / loading time). You will also have only one remote request using this approach.
Upvotes: 1