Reputation: 1293
When iterating over S3 objects using Python/boto3, I see that there's a filter method. But can you apply a NOT condition?
I want to just get the top level objects, not objects in folders (they have a prefix). I am currently doing this and it works:
import boto3
s3 = boto3.resource('s3')
bucket = cfg['s3']['bucket_name']
for obj in s3.Bucket(bucket).objects.all():
if not re.match('folder_name.*', obj.key):
I see support for a filter like this:
for obj in s3.Bucket(bucket).objects.filter(Prefix=folder_name):
I'm asking is there a way to say Prefix != folder_name?
Upvotes: 0
Views: 1555
Reputation: 2240
You can do this with s3pathlib. It provide an objective oriented interface for S3Path so you can easily create a simple filter function that takes S3Path
as input argument, and returns True
/ False
to indicate that if you want to yield it. This is an example solves your problem:
from s3pathlib import S3Path
# use tailing / to indicate that it is a dir
p_dir = S3Path("bucket", "root_dir/")
# define a filter
n_parts_of_root_dir = len(p_dir.parts)
def top_level_object(s3path):
return len(s3path.parts) == (n_parts_of_root_dir + 1)
for p in p_dir.iter_objects(limit=100).filter(top_level_object):
... do what ever you want
You can also do more advanced filtering or leverage the built-in filters out-of-the box. For example you can filter by attributes like S3Path.dirname
, S3Path.basename
, S3Path.fname
, S3Path.ext
. See this document for more information
Upvotes: 0
Reputation: 10827
If you just want a list of objects at without a shared prefix, specify the delimiter to the filter, and boto3 will filter away the shared prefixes:
s3 = boto3.resource('s3')
for obj in s3.Bucket(bucket).objects.filter(Delimiter='/'):
print(obj.key)
Upvotes: 1