Andrew Johnson
Andrew Johnson

Reputation: 13286

How to use boto3 (or other Python) to list the contents of a _RequesterPays_ S3 bucket?

You can download a file via boto3 from a RequesterPays S3 bucket, as follows:

  s3_client.download_file('aws-naip', 'md/2013/1m/rgbir/38077/{}'.format(filename), full_path, {'RequestPayer':'requester'})

What I can't figure out is how to list the objects in the bucket... I get an authentication error when I try and call objects.all() on the bucket.

How can I use boto3 to enumerate the contents of a RequesterPays bucket? Please note this is a particular kind of bucket where the requester pays the S3 charges.

Upvotes: 8

Views: 10669

Answers (3)

Alexis Kanter
Alexis Kanter

Reputation: 1

I had the same issue so here is the code:

import boto3

s3 = boto3.resource('s3')

for bucket in s3.buckets.all():
    print(bucket.name)

client = boto3.client('s3')

result= client.list_objects(Bucket='bucketname',RequestPayer='requester')
for o in result['Contents']:
    print(o['Key'])

The response to the query is a dictionary, and within that dictionary there is another dictionary named contents where the keys are the paths to the objects. You can check the response fields in the following link: List_objects documentation

Note : list_objects returns up to 1000 contents so you would have to iterate over with the next_marker property (I will update this answer if you would like the full list) . I guess you have already figured out how to setup the access key and secret key. Let me know if you need more details on that.

Upvotes: 0

perrygeo
perrygeo

Reputation: 385

You have to pass the RequestPayer kwarg to the list_objects method.

Also, according to the boto3 docs,

Note: ListObjectsV2 is the revised List Objects API and we recommend you use this revised API for new application development

Putting that together with pagination would look like:

import boto3
s3_client = boto3.client('s3')

def get_keys(bucket, prefix, requester_pays=False):
    """Get s3 objects from a bucket/prefix
    optionally use requester-pays header
    """
    extra_kwargs = {}
    if requester_pays:
        extra_kwargs = {'RequestPayer': 'requester'}

    next_token = 'init'
    while next_token:
        kwargs = extra_kwargs.copy()
        if next_token != 'init':
            kwargs.update({'ContinuationToken': next_token})

        resp = s3_client.list_objects_v2(
            Bucket=bucket, Prefix=prefix, **kwargs)

        try:
            next_token = resp['NextContinuationToken']
        except KeyError:
            next_token = None

        for contents in resp['Contents']:
            key = contents['Key']
            yield key

and would be used like

x = list(get_keys('aws-naip', 'co', requester_pays=True))

Upvotes: 2

Raf
Raf

Reputation: 10097

From boto3, we can see that there is a #S3.Client.list_objects method. This can be used to enumerate objects:

import boto3
s3_client = boto3.client('s3')
resp = s3_client.list_objects(Bucket='RequesterPays')

# print names of all objects
for obj in resp['Contents']:
    print 'Object Name: %s' % obj['Key']

Output:

Object Name: pic.gif
Object Name: doc.txt
Object Name: page.html

If you are getting a 401 then make sure that IAM user calling the API has s3:GetObject permissions on the bucket.

Upvotes: -1

Related Questions