Jonathan
Jonathan

Reputation: 1755

Faster way to get Azure Storage Container File count and Size?

I am trying to get the count of files and files size of each containers but it is very slow. This is the code I am currently using:

var blobServiceClient = new BlobServiceClient(connectionStr);
var blobs = blobServiceClient.GetBlobContainers();    
foreach (var blob in blobs)
    {

        var containerClient = blobServiceClient.GetBlobContainerClient(blob.Name);                        
        var blobItems = containerClient.GetBlobs();
        var fileCount = blobItems.Count();

        
        long fileSize = 0;
        foreach (var blobItem in blobItems)
        {
            var blobClient = containerClient.GetBlobClient(blobItem.Name);
            var properties = blobClient.GetProperties();
            fileSize += properties.Value.ContentLength;
        }

        var storageInfo = new StorageInformation()
        {
            Customer_GUID = new Guid(blob.Name),
            FileCount = fileCount,
            StorageSize = fileSize
        };
        dbContext.StorageInformation.Add(storageInfo);
        await dbContext.SaveChangesAsync();
     
    }

Is there a way to do this faster? I have about 500 containers averaging 40k blobs in each one.

Upvotes: 0

Views: 1669

Answers (1)

Rajesh  Mopati
Rajesh Mopati

Reputation: 1506

Thanks @Peter Bons for the comment.

Performance can be improved with the little improvements of code, and timely deallocation of resources after used. And also using best of the indexing azure bobs (Indexing Azure Bolbs)

using parallel programing we can acheive this.

And also by saving only the metadata of the blobs in a database. And fetching from the datbase based on requirement.

Using Parallel foreach loops

Sample Code:

Parallel.ForEach(integerList, i => { long total = DoSomeIndependentTimeconsumingTask(); Console.WriteLine("{0} - {1}", i, total); });

// Sequential version

foreach (var item in sourceCollection)
{
    Process(item);
}

// Parallel equivalent

Parallel.ForEach(sourceCollection, item => Process(item));

Blob metadata can be indexed, and it is helpful if you think any custom metadata properties will be useful in filters and queries.

.NET supports for parallel programming by providing a runtime, class library types, and diagnostic tools. These were introduced in .NET Framework 4, which will simplify parallel development. You can write your own custom code without having to work directly with threads or the thread pool.

References:

parallel-programming

unbounded-parallelism

Upvotes: 1

Related Questions