Hippolyte Joubin
Hippolyte Joubin

Reputation: 13

Listing blob by prefix in CloudBlobDirectory object

I have an azure container with directories named by date (eg 20201203 contains all the files created on December 23 2020). The files in the directory are named like this : {filename}{format}{extension} For exemple in directory 20201203 I have these 3 files :

I want to get all the blobs specific to a file name (so all the files with the different formats). The method ListBlobs() of a CloudBlobContainer object can take a string argument that filters the blobs. The problem is that in my case, I have a CloudBlobDirectory object and not a CloudBlobContainer. The ListBlobs() method of a CloudBlobDirectory object does not have an overload with a simple string parameter to be used as a prefix filter to get specific blobs. I could of course retrieve all the blobs of the directory with var blobs = myAzureDirectory.ListBlobs() and then check for each blob if its name start with the name I'm looking for. But this would have poor performances as I have a lot of files in each directories and a lot of Directories to treat.

Is there a good way to do that ? Thanks!

Upvotes: 1

Views: 2440

Answers (1)

Stanley Gong
Stanley Gong

Reputation: 12153

As we know, Azure storage SDKs are basically based on Azure REST APIs, for instance, list ListBlobs() is based on Azure storage ListBlobs API. And as you can see, this API only provides prefix as a param to filter files by file name prefix. But in your scenario, you want to list all files under different virtual folders, in Azure storage, blob virtual folders path will be considered as part of blob name, so prefix param not helps here.

The reason that your way is suffering a poor performance is due to loop querying all of your virtual directories: one loop means one HTTP request, and as you said, you have a lot of directories in your container which caused the bad performance.

Actually, you can just fetch the list of all your blobs in your container, and filter blobs your self, basically, it is only one HTTP request here, and filtering blobs locally is much faster than API calls. Try the code below:

using Microsoft.WindowsAzure.Storage;
using System;

namespace sdkv11
{
    class Program
    {
        static void Main(string[] args)
        {
            var connstr = "<connection string >";
            var containerName = "<your container name>";
            var filePrefix = "<file prefix>";
            var storageAccount = CloudStorageAccount.Parse(connstr);
            var container = storageAccount.CreateCloudBlobClient().GetContainerReference(containerName);
            var blobs = container.ListBlobs(useFlatBlobListing: true);
            foreach (var blob in blobs) {

                var fileName = blob.Uri.LocalPath.Substring(blob.Uri.LocalPath.LastIndexOf("/") + 1);
                
                if (fileName.StartsWith(filePrefix)) {
                    Console.WriteLine(blob.Uri);
                }
                
            }
            Console.ReadKey();
            
        }
    }
}

Result:

enter image description here

Upvotes: 1

Related Questions