AbtPst
AbtPst

Reputation: 8018

Google Cloud Storage : Python API get blob information with wildcard

I am trying to get blob information from a bucket but i want to use wildcards in blob name. Consider my bucket

$ gsutil ls gs://myBucket/myPath/
gs://myBucket/myPath/
gs://myBucket/myPath/ranOn=2018-12-11/
gs://myBucket/myPath/ranOn=2018-12-12/
gs://myBucket/myPath/ranOn=2018-12-13/
gs://myBucket/myPath/ranOn=2018-12-14/
gs://myBucket/myPath/ranOn=2018-12-15/
gs://myBucket/myPath/ranOn=2019-02-18/
gs://myBucket/myPath/ranOn=2019-02-19/
gs://myBucket/myPath/ranOn=2019-02-20/
gs://myBucket/myPath/ranOn=2019-02-21/

now from the command line, i am able to do

$ gsutil ls gs://myBucket/myPath/ranOn=2018*
gs://myBucket/myPath/
gs://myBucket/myPath/ranOn=2018-12-11/
gs://myBucket/myPath/ranOn=2018-12-12/
gs://myBucket/myPath/ranOn=2018-12-13/
gs://myBucket/myPath/ranOn=2018-12-14/
gs://myBucket/myPath/ranOn=2018-12-15/

and hence i can do the same for the size

$ gsutil du -sh gs://myBucket/myPath/ranOn=2018*
2.7 G

now, i want to do the same thing with the python api. Here is what i tried

from google.cloud import storage

storage_client = storage.Client()
bucket = storage_client.get_bucket('myBucket')
blob = bucket.get_blob('myPath/ranOn=2018*')
print('Size: {} bytes'.format(blob.size))
Size: None bytes

why is this not working? How can i use wildcards in blob paths with python api?

Upvotes: 4

Views: 3662

Answers (1)

Dustin Ingram
Dustin Ingram

Reputation: 21550

Unfortunately get_blob is just for getting individual files, not multiple files.

You'll need to iterate over all the files that match the prefix and sum their sizes to get the total size.

blobs = bucket.list_blobs(prefix="myPath/ranOn=2018")

total = sum([blob.size for blob in blobs])

Upvotes: 3

Related Questions