marsolmos
marsolmos

Reputation: 794

How to download latest n items from AWS S3 bucket using boto3?

I have an S3 bucket where my application saves some final result DataFrames as .csv files. I would like to download the latest 1000 files in this bucket, but I don't know how to do it.

I cannot do it mannualy, as the bucket doesn't allow me to sort the files by date because it has more than 1000 elements

bucket limit size for sorting

I've seen some questions that could work using AWS CLI, but I don't have enough user permissions to use the AWS CLI, so I have to do it with a boto3 python script that I'm going to upload into a lambda.

How can I do this?

Upvotes: 1

Views: 817

Answers (2)

goncuesma
goncuesma

Reputation: 121

If your application uploads files periodically, you could try this:

import boto3
import datetime

last_n_days = 250
s3 = boto3.client('s3')

paginator = s3.get_paginator('list_objects_v2')
pages = paginator.paginate(Bucket='bucket', Prefix='processed')
date_limit = datetime.datetime.now() - datetime.timedelta(30)
for page in pages:
    for obj in page['Contents']:
        if obj['LastModified'] >= date_limit and obj['Key'][-1] != '/':
             s3.download_file('bucket', obj['Key'], obj['Key'].split('/')[-1])

With the script above, all files modified in the last 250 days will be downloaded. If your application uploads 4 files per day, this could do the fix.

Upvotes: 2

Parsifal
Parsifal

Reputation: 4486

The best solution is to redefine your problem: rather than retrieving the N most recent files, retrieve all files from the N most recent days. I think that you'll find this to be a better solution in most cases.

However, to make it work you'll need to adopt some form of date-stamped prefix for the uploaded files. For example, 2021-04-16/myfile.csv.

If you feel that you must retrieve N files, then you can use the prefix to retrieve only a portion of the list. Assuming that you know that you have approximately 100 files uploaded per day, then start your bucket listing with 2021-04-05/.

Upvotes: 0

Related Questions