Reputation: 8018
I am trying to get blob information from a bucket but i want to use wildcards in blob name. Consider my bucket
$ gsutil ls gs://myBucket/myPath/
gs://myBucket/myPath/
gs://myBucket/myPath/ranOn=2018-12-11/
gs://myBucket/myPath/ranOn=2018-12-12/
gs://myBucket/myPath/ranOn=2018-12-13/
gs://myBucket/myPath/ranOn=2018-12-14/
gs://myBucket/myPath/ranOn=2018-12-15/
gs://myBucket/myPath/ranOn=2019-02-18/
gs://myBucket/myPath/ranOn=2019-02-19/
gs://myBucket/myPath/ranOn=2019-02-20/
gs://myBucket/myPath/ranOn=2019-02-21/
now from the command line, i am able to do
$ gsutil ls gs://myBucket/myPath/ranOn=2018*
gs://myBucket/myPath/
gs://myBucket/myPath/ranOn=2018-12-11/
gs://myBucket/myPath/ranOn=2018-12-12/
gs://myBucket/myPath/ranOn=2018-12-13/
gs://myBucket/myPath/ranOn=2018-12-14/
gs://myBucket/myPath/ranOn=2018-12-15/
and hence i can do the same for the size
$ gsutil du -sh gs://myBucket/myPath/ranOn=2018*
2.7 G
now, i want to do the same thing with the python api. Here is what i tried
from google.cloud import storage
storage_client = storage.Client()
bucket = storage_client.get_bucket('myBucket')
blob = bucket.get_blob('myPath/ranOn=2018*')
print('Size: {} bytes'.format(blob.size))
Size: None bytes
why is this not working? How can i use wildcards in blob paths with python api?
Upvotes: 4
Views: 3662
Reputation: 21550
Unfortunately get_blob
is just for getting individual files, not multiple files.
You'll need to iterate over all the files that match the prefix and sum their sizes to get the total size.
blobs = bucket.list_blobs(prefix="myPath/ranOn=2018")
total = sum([blob.size for blob in blobs])
Upvotes: 3