Chuck
Chuck

Reputation: 1293

AWS S3 object filter to NOT match Prefix in Python script

When iterating over S3 objects using Python/boto3, I see that there's a filter method. But can you apply a NOT condition?

I want to just get the top level objects, not objects in folders (they have a prefix). I am currently doing this and it works:

import boto3

s3 = boto3.resource('s3')
bucket = cfg['s3']['bucket_name']

for obj in s3.Bucket(bucket).objects.all():
    if not re.match('folder_name.*', obj.key):

I see support for a filter like this:

for obj in s3.Bucket(bucket).objects.filter(Prefix=folder_name):

I'm asking is there a way to say Prefix != folder_name?

Upvotes: 0

Views: 1555

Answers (2)

MacSanhe
MacSanhe

Reputation: 2240

You can do this with s3pathlib. It provide an objective oriented interface for S3Path so you can easily create a simple filter function that takes S3Path as input argument, and returns True / False to indicate that if you want to yield it. This is an example solves your problem:

from s3pathlib import S3Path

# use tailing / to indicate that it is a dir
p_dir = S3Path("bucket", "root_dir/")

# define a filter
n_parts_of_root_dir = len(p_dir.parts)

def top_level_object(s3path):
    return len(s3path.parts) == (n_parts_of_root_dir + 1)

for p in p_dir.iter_objects(limit=100).filter(top_level_object):
    ... do what ever you want

You can also do more advanced filtering or leverage the built-in filters out-of-the box. For example you can filter by attributes like S3Path.dirname, S3Path.basename, S3Path.fname, S3Path.ext. See this document for more information

Upvotes: 0

Anon Coward
Anon Coward

Reputation: 10827

If you just want a list of objects at without a shared prefix, specify the delimiter to the filter, and boto3 will filter away the shared prefixes:

s3 = boto3.resource('s3')
for obj in s3.Bucket(bucket).objects.filter(Delimiter='/'):
    print(obj.key)

Upvotes: 1

Related Questions