Aman Bansal
Aman Bansal

Reputation: 33

Regex with Prefix parameter for list_blobs_with_prefix()

I'm trying to get objects from gcp storage with prefix using python api client but getting problem with prefix parameter. I'm able to do it in gsutil with

gsutil ls -h gs://{{bucket-name}}/*/latest/

But not with python api

I'm using the function from documentation. Tried passing prefix parameter as

*/latest/ /*/latest *

and letting delimiter as none.Still not getting any result.

    storage_client = storage.Client()

    # Note: Client.list_blobs requires at least package version 1.17.0.
    blobs = storage_client.list_blobs(bucket_name, prefix=prefix,
                                      delimiter=delimiter)

    print('Blobs:')
    for blob in blobs:
        print(blob.name)

    if delimiter:
        print('Prefixes:')
        for prefix in blobs.prefixes:
            print(prefix)

The expected output is

gs://{{bucket-name}}/product/latest/:
gs://{{bucket-name}}/product/latest/health
gs://{{bucket-name}}/product/latest/index.html

Upvotes: 3

Views: 5187

Answers (1)

Brandon Yarbrough
Brandon Yarbrough

Reputation: 38389

gsutil knows about regexes, but the GCS APIs themselves do not. The APIs only support literal prefixes.

Instead, you'll need to fetch everything and filter with the regex yourself, which is what gsutil is doing in your example.

all_blobs = storage_client.list_blobs(bucket_name)
regex = re.compile(r'.*/latest/.*')
blobs = filter(regex.match, all_blobs)

If you are going to have too many objects to make this worthwhile, I recommend reorganizing your data in a way that allows you to put a non-wildcard match at the beginning of the path, so that you can filter server-side.

Upvotes: 4

Related Questions