Reputation: 170
I have a scenario where multiple files are present in an AWS S3 bucket. I need to be able pick the most recent file for every file type based on their last modified date. I can also use the numeric part of the file name as it's indicative of the hour_yearmonthday
when the file was created.
The following two files needs to be picked as they were the last modified ones - File_A_02_20220728.csv
and File_B_02_20220728.csv
. Any suggestions / snippets on how to do this would be much appreciated.
s3://bucket/File_A_00_20220728.csv
s3://bucket/File_A_01_20220728.csv
s3://bucket/File_A_02_20220728.csv
s3://bucket/File_B_00_20220728.csv
s3://bucket/File_B_01_20220728.csv
s3://bucket/File_B_02_20220728.csv
Upvotes: 0
Views: 2184
Reputation: 269282
There is no in-built function for Amazon S3 to do this for you.
You would need to use list_objects_v2()
to list the contents of the bucket. Then, use Python logic/lists/dictionaries to identify the files you want. I would recommend:
For an example of grouping by extension, see: Search S3 bucket for file extension and size
For an example of selecting the 'latest' object, see: How to get the latest file of an S3 bucket using Boto3?
Upvotes: 3