Reputation: 33
I'm trying to get objects from gcp storage with prefix using python api client but getting problem with prefix parameter. I'm able to do it in gsutil with
gsutil ls -h gs://{{bucket-name}}/*/latest/
But not with python api
I'm using the function from documentation. Tried passing prefix parameter as
*/latest/
/*/latest
*
and letting delimiter as none.Still not getting any result.
storage_client = storage.Client()
# Note: Client.list_blobs requires at least package version 1.17.0.
blobs = storage_client.list_blobs(bucket_name, prefix=prefix,
delimiter=delimiter)
print('Blobs:')
for blob in blobs:
print(blob.name)
if delimiter:
print('Prefixes:')
for prefix in blobs.prefixes:
print(prefix)
The expected output is
gs://{{bucket-name}}/product/latest/:
gs://{{bucket-name}}/product/latest/health
gs://{{bucket-name}}/product/latest/index.html
Upvotes: 3
Views: 5187
Reputation: 38389
gsutil knows about regexes, but the GCS APIs themselves do not. The APIs only support literal prefixes.
Instead, you'll need to fetch everything and filter with the regex yourself, which is what gsutil is doing in your example.
all_blobs = storage_client.list_blobs(bucket_name)
regex = re.compile(r'.*/latest/.*')
blobs = filter(regex.match, all_blobs)
If you are going to have too many objects to make this worthwhile, I recommend reorganizing your data in a way that allows you to put a non-wildcard match at the beginning of the path, so that you can filter server-side.
Upvotes: 4