Reputation: 39
How can I list all S3 objects uploaded in a specific S3 bucket in the last hour?
I am using the below code to list objects but the problem is my_bucket is having more than a million objects:
import boto3
client = boto3.client('s3',
aws_access_key_id=s3_access_id,
aws_secret_access_key=s3_secret_access_key)
get_folder_objects = client.list_objects_v2(
Bucket='my_bucket',
Delimiter='',
EncodingType='url',
MaxKeys=1000,
Prefix='api/archive/ls.s3.',
FetchOwner=False,
StartAfter=''
)
Although it is not giving me the results in the sorted order by last modified date of the S3 object. My file names are like in the below format: "ls.s3.fa74a3f1-fc08-4955-809d-f323304f7496.2020-06-29T13.00.part107458.txt"
I have looked for this sort of question everywhere but no one was able to answer it correctly. Some said that it is not at all possible in Python.
Please help me with this, I shall be highly thankful to you.
Upvotes: 1
Views: 2813
Reputation: 270294
The list_objects_v2()
API call will return a maximum of 1000 objects per call.
If the response contains a NextContinuationToken
, then you should make the call again passing this value in ContinuationToken
. Alternatively, you can use a paginator that will do this for you.
The objects will be returned in lexicographical order (effectively alphabetical). Your program will need to look at the result set and filter the results based on the LastModified
timestamp returned with each object. It is not possible to request a listing only of objects modified since a certain time. The only filter available is the Prefix
.
See also: How list Amazon S3 bucket contents by modified date?
Since you have so many objects in your bucket, you might consider using Amazon S3 Inventory, which can provide a daily or weekly CSV file listing all objects.
If you need a fast, regular way to retrieve a list of objects, you could maintain a database of objects. The database can be updated by defining an Amazon S3 Event to trigger an AWS Lambda function whenever an object is added/deleted. This involves a lot of overhead, but will provide faster access that calling ListObjects()
.
Upvotes: 2