Reputation: 2215
It seems to me that google.cloud.storage.Client::list_blobs
returns a HTTPIterator
which is not a proper python iterator. See below:
import google.cloud.storage as gcs
client = gcs.Client()
blobs = client.list_blobs("mybucket")
blob = next(blobs) # TypeError: 'HTTPIterator' object is not an iterator
blob = blobs.__next__() # AttributeError: 'HTTPIterator' object has no attribute '__next__'
I'm looking for a solution that does not iterate through the entire iterator. The only solution I can come up with is a silly hack: for loop and break after the first loop.
Upvotes: 3
Views: 4352
Reputation: 81414
Without understanding the details of a Page Iterator, you can simply convert the iterator to a list:
blobs = client.list_blobs(bucketName)
blob_list = list(blobs)
# First blob
blob_list[0].name
# Second blob
blob_list[1].name
# Of course you can check the number of list items with len()
count = len(blob_list)
In reality, it is important to understand that the function list_blobs()
does not fetch everything all at once. Typically, the library will fetch 1,000 objects at a time. This is called paging. Assuming a bucket has 1,500 objects, two pages of objects will be fetched by iteration (1000 objects and 500 objects). However, less than 1,000 objects might be returned.
blobs = client.list_blobs(bucketName)
for page in blobs.pages:
print('Page number: ', blobs.page_number)
print('Count: ', page.num_items)
Output:
Page number: 1
Count: 1000
Page number: 2
Count: 500
When you convert a Page Iterator to a list, all of the objects are fetched. For large buckets, this could take a substantial amount of time to only display the first and next objects.
For a better understanding, study the source code for the Page Iterator.
Upvotes: 3