brennazoon
brennazoon

Reputation: 1459

Getting blob count in an Azure Storage container

What is the most efficient way to get the count on the number of blobs in an Azure Storage container?

Right now I can't think of any way other than the code below:

CloudBlobContainer container = GetContainer("mycontainer");
var count = container.ListBlobs().Count();

Upvotes: 39

Views: 66028

Answers (15)

ns15
ns15

Reputation: 8824

This answer is for someone with large blob storage with millions of blobs.

The top-rated answer on this thread is pretty much unusable with large blob storages. The azure storage explorer application simply calls list blobs API under the hood which is paginated and allows 5000 records at a time. In case you have millions of blobs, this will take forever to return the blob count.

If you are ok with approximate value, then the storage browser option in azure portal is extremely useful. However, note that this value is not very accurate on blob storages that have high write/delete operations.

enter image description here Also, this data should be visible by default. If not, enable the diagnostics metrics. Monitoring -> Diagnostic Settings(classic). (Turn the status on and enable the hour metrics)

If you want more accurate results, then the only option is to enable blob storage inventory report. The downside is that this is a background job, and the report can be generated only once per day. Here is the document on the same. For large blob storage's, my suggestion is to generate a parquet report every day and when you need to inspect/read the report, either use Dbeaver(along with DuckDB) or Databricks or Synapse. Below listed few resources on how this can be achieved.

If you do not wish to use inventory report, here is a PowerShell script to achieve something similar. However, this can take many hours to return blob count on large blob storages.

Upvotes: 1

Francois
Francois

Reputation: 934

With azure-cli it would be as follow:

az storage blob list --account-name <name> --container-name <name> --num-results "*" --query "length(@)"

Upvotes: 1

Jacob Foshee
Jacob Foshee

Reputation: 2773

Bearing in mind all the performance concerns from the other answers, here is a version for v12 of the Azure SDK leveraging IAsyncEnumerable. This requires a package reference to System.Linq.Async.

public async Task<int> GetBlobCount()
{
    var container = await GetBlobContainerClient();
    var blobsPaged = container.GetBlobsAsync();
    return await blobsPaged
        .AsAsyncEnumerable()
        .CountAsync();
}

Upvotes: 2

Moataz Allam
Moataz Allam

Reputation: 1

You can use this

 public static async Task<List<IListBlobItem>> ListBlobsAsync()
 {
   BlobContinuationToken continuationToken = null;
   List<IListBlobItem> results = new List<IListBlobItem>();
   do
    {
      CloudBlobContainer container = GetContainer("containerName");
      
      var response = await container.ListBlobsSegmentedAsync(null,
         true, BlobListingDetails.None, 5000, continuationToken, null, null);
      
      continuationToken = response.ContinuationToken;

      results.AddRange(response.Results);

      } while (continuationToken != null);
       return results;
 }

and then call

var count = await ListBlobsAsync().Count;

hope it will be useful

Upvotes: 0

Varun Garg
Varun Garg

Reputation: 2654

List blobs approach is accurate but slow if you have millions of blobs. Another way that works in a few cases but is relatively fast is querying the MetricsHourPrimaryTransactionsBlob table. It is at the account level and metrics get aggregated hourly.

https://learn.microsoft.com/en-us/azure/storage/common/storage-analytics-metrics

Upvotes: 0

sprash
sprash

Reputation: 345

If you are using Azure.Storage.Blobs library, you can use something like below:

public int GetBlobCount(string containerName)
{
    int count = 0;
    BlobContainerClient container = new BlobContainerClient(blobConnctionString, containerName);
    container.GetBlobs().ToList().ForEach(blob => count++);
    return count;
}

Upvotes: 1

OzBob
OzBob

Reputation: 4530

Count all blobs in a classic and new blob storage account. Building on @gandikota-saikoushik, this solution works for blob containers with a very large number of blobs.

//setup set values from Azure Portal
var accountName = "<ACCOUNTNAME>";
var accountKey = "<ACCOUTNKEY>";
var containerName = "<CONTAINTERNAME>";
uristr = $"DefaultEndpointsProtocol=https;AccountName={accountName};AccountKey={accountKey}";

var storageAccount = Microsoft.WindowsAzure.Storage.CloudStorageAccount.Parse(uristr);
var client = storageAccount.CreateCloudBlobClient();
var container = client.GetContainerReference(containerName);
BlobContinuationToken continuationToken = new BlobContinuationToken();
blobcount = CountBlobs(container, continuationToken).ConfigureAwait(false).GetAwaiter().GetResult();
Console.WriteLine($"blobcount:{blobcount}");


public static async Task<int> CountBlobs(CloudBlobContainer container, BlobContinuationToken currentToken)
{
    BlobContinuationToken continuationToken = null;
    var result = 0;
    do
    {
        var response = await container.ListBlobsSegmentedAsync(continuationToken);
        continuationToken = response.ContinuationToken;
        result += response.Results.Count();
    }
    while (continuationToken != null);

    return result;
}

Upvotes: 0

Gandikota Saikoushik
Gandikota Saikoushik

Reputation: 103

I have spend quite period of time to find the below solution - I don't want to some one like me to waste time - so replying here even after 9 years

package com.sai.koushik.gandikota.test.app;

import com.microsoft.azure.storage.CloudStorageAccount;
import com.microsoft.azure.storage.blob.*;


public class AzureBlobStorageUtils {


    public static void main(String[] args) throws Exception {
        AzureBlobStorageUtils getCount =  new AzureBlobStorageUtils();
        String storageConn = "<StorageAccountConnection>";
        String blobContainerName = "<containerName>";
        String subContainer =  "<subContainerName>";
        Integer fileContainerCount = getCount.getFileCountInSpecificBlobContainersSubContainer(storageConn,blobContainerName, subContainer);
        System.out.println(fileContainerCount);
    }

    public Integer getFileCountInSpecificBlobContainersSubContainer(String storageConn, String blobContainerName, String subContainer) throws Exception {
        try {
            CloudStorageAccount storageAccount = CloudStorageAccount.parse(storageConn);
            CloudBlobClient blobClient = storageAccount.createCloudBlobClient();
            CloudBlobContainer blobContainer = blobClient.getContainerReference(blobContainerName);
            return ((CloudBlobDirectory) blobContainer.listBlobsSegmented().getResults().stream().filter(listBlobItem -> listBlobItem.getUri().toString().contains(subContainer)).findFirst().get()).listBlobsSegmented().getResults().size();
        } catch (Exception e) {
            throw new Exception(e.getMessage());
        } 
    }

}


Upvotes: 0

k.antipov
k.antipov

Reputation: 41

Another Python example, works slow but correctly with >5000 files:

from azure.storage.blob import BlobServiceClient

constr="Connection string"
container="Container name"

blob_service_client = BlobServiceClient.from_connection_string(constr)
container_client = blob_service_client.get_container_client(container)
blobs_list = container_client.list_blobs()

num = 0
size = 0
for blob in blobs_list:
    num += 1
    size += blob.size
    print(blob.name,blob.size)

print("Count: ", num)
print("Size: ", size)

Upvotes: 0

Matt
Matt

Reputation: 721

If you just want to know how many blobs are in a container without writing code you can use the Microsoft Azure Storage Explorer application.

  1. Open the desired BlobContainer enter image description here
  2. Click the Folder Statistics icon enter image description here
  3. Observe the count of blobs in the Activities window enter image description here

Upvotes: 48

David Airapetyan
David Airapetyan

Reputation: 5620

I tried counting blobs using ListBlobs() and for a container with about 400,000 items, it took me well over 5 minutes.

If you have complete control over the container (that is, you control when writes occur), you could cache the size information in the container metadata and update it every time an item gets removed or inserted. Here is a piece of code that would return the container blob count:

static int CountBlobs(string storageAccount, string containerId)
{
    CloudStorageAccount cloudStorageAccount = CloudStorageAccount.Parse(storageAccount);
    CloudBlobClient blobClient = cloudStorageAccount.CreateCloudBlobClient();
    CloudBlobContainer cloudBlobContainer = blobClient.GetContainerReference(containerId);

    cloudBlobContainer.FetchAttributes();

    string count = cloudBlobContainer.Metadata["ItemCount"];
    string countUpdateTime = cloudBlobContainer.Metadata["CountUpdateTime"];

    bool recountNeeded = false;

    if (String.IsNullOrEmpty(count) || String.IsNullOrEmpty(countUpdateTime))
    {
        recountNeeded = true;
    }
    else
    {
        DateTime dateTime = new DateTime(long.Parse(countUpdateTime));

        // Are we close to the last modified time?
        if (Math.Abs(dateTime.Subtract(cloudBlobContainer.Properties.LastModifiedUtc).TotalSeconds) > 5) {
            recountNeeded = true;
        }
    }

    int blobCount;
    if (recountNeeded)
    {
        blobCount = 0;
        BlobRequestOptions options = new BlobRequestOptions();
        options.BlobListingDetails = BlobListingDetails.Metadata;

        foreach (IListBlobItem item in cloudBlobContainer.ListBlobs(options))
        {
            blobCount++;
        }

        cloudBlobContainer.Metadata.Set("ItemCount", blobCount.ToString());
        cloudBlobContainer.Metadata.Set("CountUpdateTime", DateTime.Now.Ticks.ToString());
        cloudBlobContainer.SetMetadata();
    }
    else
    {
        blobCount = int.Parse(count);
    }

    return blobCount;
}

This, of course, assumes that you update ItemCount/CountUpdateTime every time the container is modified. CountUpdateTime is a heuristic safeguard (if the container did get modified without someone updating CountUpdateTime, this will force a re-count) but it's not reliable.

Upvotes: 16

Bill Christenson
Bill Christenson

Reputation: 819

If you are not using virtual directories, the following will work as previously answered.

CloudBlobContainer container = GetContainer("mycontainer");
var count = container.ListBlobs().Count();

However, the above code snippet may not have the desired count if you are using virtual directories.

For instance, if your blobs are stored similar to the following: /container/directory/filename.txt where the blob name = directory/filename.txt the container.ListBlobs().Count(); will only count how many "/directory" virtual directories you have. If you want to list blobs contained within virtual directories, you need to set the useFlatBlobListing = true in the ListBlobs() call.

CloudBlobContainer container = GetContainer("mycontainer");
var count = container.ListBlobs(null, true).Count();

Note: the ListBlobs() call with useFlatBlobListing = true is a much more expensive/slow call...

Upvotes: 2

Mustafa Celik
Mustafa Celik

Reputation: 2399

With Python API of Azure Storage it is like:

from azure.storage import *
blob_service = BlobService(account_name='myaccount', account_key='mykey')
blobs = blob_service.list_blobs('mycontainer')
len(blobs)  #returns the number of blob in a container

Upvotes: 1

Sambo
Sambo

Reputation: 31

Example using PHP API and getNextMarker.

Counts total number of blobs in an Azure container. It takes a long time: about 30 seconds for 100000 blobs.

(assumes we have a valid $connectionString and a $container_name)

$blobRestProxy = ServicesBuilder::getInstance()->createBlobService($connectionString);
$opts = new ListBlobsOptions();
$nblobs = 0;

while($cont) {

  $blob_list = $blobRestProxy->listBlobs($container_name, $opts);      

  $nblobs += count($blob_list->getBlobs());

  $nextMarker = $blob_list->getNextMarker();

  if (!$nextMarker || strlen($nextMarker) == 0) $cont = false;
  else $opts->setMarker($nextMarker);
}
echo $nblobs;

Upvotes: 3

David Makogon
David Makogon

Reputation: 71121

The API doesn't contain a container count method or property, so you'd need to do something like what you posted. However, you'll need to deal with NextMarker if you exceed 5,000 items returned (or if you specify max # to return and the list exceeds that number). Then you'll make add'l calls based on NextMarker and add the counts.

EDIT: Per smarx: the SDK should take care of NextMarker for you. You'll need to deal with NextMarker if you're working at the API level, calling List Blobs through REST.

Alternatively, if you're controlling the blob insertions/deletions (through a wcf service, for example), you can use the blob container's metadata area to store a cached container count that you compute with each insert or delete. You'll just need to deal with write concurrency to the container.

Upvotes: 11

Related Questions