aaron
aaron

Reputation: 597

How to list all blobs inside of a specific subdirectory in Azure Cloud Storage using Python?

I worked through the example code from the Azure docs https://learn.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-python

from azure.storage.blob import BlockBlobService
account_name = "x"
account_key = "x"
top_level_container_name = "top_container"

blob_service = BlockBlobService(account_name, account_key)

print("\nList blobs in the container")
generator = blob_service.list_blobs(top_level_container_name)
for blob in generator:
    print("\t Blob name: " + blob.name)

Now I would like to know how to get more fine grained in my container walking. My container top_level_container_name has several subdirectories

I would like to be able to list all of the blobs that are inside just one of those directories. For instance

How do I get a generator of just the contents of dir1 without having to walk all of the other dirs? (I would also take a list or dictionary)

I tried adding /dir1 to the name of the top_level_container_name so it would be top_level_container_name = "top_container/dir1" but that didn't work. I get back an error code azure.common.AzureHttpError: The requested URI does not represent any resource on the server. ErrorCode: InvalidUri

The docs do not seem to even have any info on BlockBlobService.list_blobs() https://learn.microsoft.com/en-us/python/api/azure.storage.blob.blockblobservice.blockblobservice?view=azure-python

Update: list_blobs() comes from https://github.com/Azure/azure-storage-python/blob/ff51954d1b9d11cd7ecd19143c1c0652ef1239cb/azure-storage-blob/azure/storage/blob/baseblobservice.py#L1202

Upvotes: 27

Views: 71021

Answers (5)

user3590035
user3590035

Reputation: 328

the parameter is name_starts_with. the code will look like this: container.list_blobs(name_starts_with=prefix_value)

prefix="dir1/" inside the container.

please check the documentation https://learn.microsoft.com/en-us/python/api/azure-storage-blob/azure.storage.blob.containerclient?view=azure-python#azure-storage-blob-containerclient-list-blobs

Upvotes: 0

Palash Mondal
Palash Mondal

Reputation: 538

To get the blob files inside dir or subdirectory as filepath

from azure.storage.blob import BlockBlobService
blob_service = BlockBlobService(account_name, account_key)
blobfile = []
generator = blob_service.list_blobs(container_name, prefix="filepath/", delimiter="")
for blob in generator:
    blobname = blob.name.split('/')[-1]
    blobfile.append(blobname)
    print("\t Blob name: " + blob.name)
print(blobfile)

Replace delimiter="/" to get the blob as a folder in the above code

Upvotes: 3

Erfan
Erfan

Reputation: 42886

The module azurebatchload provides for this and more. You can filter on folder or filenames, plus choose to get the the result in various formats:

  • list
  • dictionary with extended info
  • pandas dataframe

1. List a whole container with just the filenames as a list.

from azurebatchload import Utils

list_blobs = Utils(container='containername').list_blobs()

2. List a whole container with just the filenames as a dataframe.

from azurebatchload import Utils

df_blobs = Utils(
   container='containername',
   dataframe=True
).list_blobs()

3. List a folder in a container.

from azurebatchload import Utils

list_blobs = Utils(
   container='containername',
   name_starts_with="foldername/"
).list_blobs()

4. Get extended information a folder.

from azurebatchload import Utils

dict_blobs = Utils(
   container='containername',
   name_starts_with="foldername/",
   extended_info=True
).list_blobs()

5. Get extended information a folder returned as a pandas dataframe.

from azurebatchload import Utils

df_blobs = Utils(
   container='containername',
   name_starts_with="foldername/",
   extended_info=True,
   dataframe=True
).list_blobs()

disclaimer: I am the author of the azurebatchload module.

Upvotes: 8

Prashant Babber
Prashant Babber

Reputation: 481

Not able to import BlockBlobService. Seems like BlobServiceClient is the new alternative. Followed the official doc and found this:

from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient

Create a Blob Storage Account client

connect_str = <connectionstring>
blob_service_client = BlobServiceClient.from_connection_string(connect_str)

Create a container client

container_name="dummy"
container_client=blob_service_client.get_container_client(container_name)

This will list all blobs in the container inside dir1 folder/directory

blob_list = container_client.list_blobs(name_starts_with="dir1/")
for blob in blob_list:
print("\t" + blob.name)

Upvotes: 35

Gaurav Mantri
Gaurav Mantri

Reputation: 136136

Please try something like:

generator = blob_service.list_blobs(top_level_container_name, prefix="dir1/")

This should list blobs and folders in dir1 virtual directory.

If you want to list all blobs inside dir1 virtual directory, please try something like:

generator = blob_service.list_blobs(top_level_container_name, prefix="dir1/", delimiter="")

For more information, please see this link.

Upvotes: 41

Related Questions