Moondra
Moondra

Reputation: 4531

How to access files within subfolders of a bucket GCS using Python?

from google.cloud import storage
import os
bucket = client.get_bucket('path to bucket')

The above code connects me to my bucket but I am struggling to connect with a specific folder within the bucket.

I am trying variants of this code, but no luck:

blob = bucket.get_blob("training/bad")
blob = bucket.get_blob("/training/bad")
blob = bucket.get_blob("path to bucket/training/bad")

I am hoping to get access to a list of images within the bad subfolder, but I can't seem to do so. I don't even fully understand what a blob is despite reading the docs, and sort of winging it based on tutorials.

Thank you.

Upvotes: 9

Views: 10948

Answers (2)

ScottMcC
ScottMcC

Reputation: 4460

If you would like to find blobs (files) that exist under a specific prefix (subdirectory) you can specify prefix and delimiter arguments to the list_blobs() function

See the following example taken from the Google Listing Objects example (also GitHub snippet)

def list_blobs_with_prefix(bucket_name, prefix, delimiter=None):
    """Lists all the blobs in the bucket that begin with the prefix.

    This can be used to list all blobs in a "folder", e.g. "public/".

    The delimiter argument can be used to restrict the results to only the
    "files" in the given "folder". Without the delimiter, the entire tree under
    the prefix is returned. For example, given these blobs:

        /a/1.txt
        /a/b/2.txt

    If you just specify prefix = '/a', you'll get back:

        /a/1.txt
        /a/b/2.txt

    However, if you specify prefix='/a' and delimiter='/', you'll get back:

        /a/1.txt

    """
    storage_client = storage.Client()
    bucket = storage_client.get_bucket(bucket_name)

    blobs = bucket.list_blobs(prefix=prefix, delimiter=delimiter)

    print('Blobs:')
    for blob in blobs:
        print(blob.name)

    if delimiter:
        print('Prefixes:')
        for prefix in blobs.prefixes:
            print(prefix)

Upvotes: 9

Dan Cornilescu
Dan Cornilescu

Reputation: 39834

What you missed is the fact that in GCS objects in a bucket aren't organized in a filesystem-like directory structure/hierarchy, but rather in a flat structure.

A more detailed explanation can be found in How Subdirectories Work (in the gsutil context, true, but the fundamental reason is the same - the GCS flat namespace):

gsutil provides the illusion of a hierarchical file tree atop the "flat" name space supported by the Google Cloud Storage service. To the service, the object gs://your-bucket/abc/def.txt is just an object that happens to have "/" characters in its name. There is no "abc" directory; just a single object with the given name.

Since there are no (sub)directories in GCS then /training/bad doesn't really exist, so you can't list its content. All you can do is list all the objects in the bucket and select the ones with names/paths that start with /training/bad.

Upvotes: 11

Related Questions