Reputation: 55
this the code I'm using, is there anyway to make it run faster:
src_uri = boto.storage_uri(bucket, google_storage)
for obj in src_uri.get_bucket():
f.write('%s\n' % (obj.name))
Upvotes: 1
Views: 261
Reputation: 95579
This is an example where it pays to use the underlying Google Cloud Storage API more directly, using the Google API Client Library for Python to consume the RESTful HTTP API. With this approach, it is possible to use request batching to retrieve the names of all objects in a single HTTP request (thereby reducing the extra HTTP request overhead) as well as to use field projection with the objects.get operation (by setting &fields=name
) to obtain a partial response so that you aren't sending all the other fields and data over the network (or waiting for retrieval of unnecessary data on the backend).
Code for this would look like:
def get_credentials():
# Your code goes here... checkout the oauth2client documentation:
# http://google-api-python-client.googlecode.com/hg/docs/epy/oauth2client-module.html
# Or look at some of the existing samples for how to do this
def get_cloud_storage_service(credentials):
return discovery.build('storage', 'v1', credentials=credentials)
def get_objects(cloud_storage, bucket_name, autopaginate=False):
result = []
# Actually, it turns out that request batching isn't needed in this
# example, because the objects.list() operation returns not just
# the URL for the object, but also its name, as well. If it had returned
# just the URL, then that would be a case where we'd need such batching.
projection = 'nextPageToken,items(name,selfLink)'
request = cloud_storage.objects().list(bucket=bucket_name, fields=projection)
while request is not None:
response = request.execute()
result.extend(response.items)
if autopaginate:
request = cloud_storage.objects().list_next(request, response)
else:
request = None
return result
def main():
credentials = get_credentials()
cloud_storage = get_cloud_storage_service(credentials)
bucket = # ... your bucket name ...
for obj in get_objects(cloud_storage, bucket, autopaginate=True):
print 'name=%s, selfLink=%s' % (obj.name, obj.selfLink)
You may find the Google Cloud Storage Python Example and other API Client Library Examples helpful in figuring out how to do this. There are also a number of YouTube videos on the Google Developers channel such as Accessing Google APIs: Common code walkthrough that provide walkthroughs.
Upvotes: 2