Reputation: 541
I'm building a process that will send customizable alerts based on the last received date of a file to an S3 bucket.
Because my bucket is huge, doing something like this takes a very long time to run:
import boto3
s3 = boto3.resource('s3',aws_access_key_id='demo', aws_secret_access_key='demo')
my_bucket = s3.Bucket('demo')
bucket_items = my_bucket.objects.all():
I could of course simply do the above, and then sort by the last_modified
attribute, but I wonder whether there's a more elegant way to sift out just the 100 most recent files themselves when the API call is being made.
Ideally, I'd also want to be able to customize this even further with search strings - i.e. I might want the 100 most recent files that have ".docx" in the file name, or I might want the most recent files above 1MB in size - etc.
Just wondering what the best practices are for this kind of querying when the contents of the entire bucket are not needed.
Upvotes: 1
Views: 1024
Reputation: 269282
Your available options are:
To maintain your own database of objects:
Upvotes: 2
Reputation: 1162
About the 100 most recent files, you can use list_objects in boto3. In return, there are 'LastModified' field to sort and get the file needed. https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.list_objects
For filtering, you can use this code to list all objects and add some code to download using something like this.
srcbucket = 'bucket'
srckey = 'object'
obj = s3.Object(srcbucket, srckey)
Upvotes: 1