user3476463
user3476463

Reputation: 4575

Get day old filepaths from s3 bucket

I have a bunch of files in an s3 bucket with prefixes like the example below. I would like to connect with boto3 and create a list of all prefixes in the bucket that have a date part older than a day. So for example if the current date was

'20191226_1213'

then I would like to create a list like the desired output below. Can anyone suggest how to do this with boto3?

example:

's3://basepath/20191225_1217/'
's3://basepath/20191224_1012/'
's3://basepath/20191222_1114/'

desired output:

['s3://basepath/20191224_1012/','s3://basepath/20191222_1114/']

update:

I'm sorry I didn't provide a better example before but my real folder path actually looks like:

's3://basepath/folder1/20191225_1217/'

Upvotes: 1

Views: 735

Answers (1)

John Rotenstein
John Rotenstein

Reputation: 269284

Here's some code that extracts the Common Prefix in the root of the given bucket and checks their names against "one day ago":

import boto3
import datetime

s3_client = boto3.client('s3')

now = datetime.datetime.now()
comparison_time = now - datetime.timedelta(days=1)
comparison_time_string = comparison_time.strftime("%Y%m%d_%H%M") # eg 20191225_0623

response = s3_client.list_objects_v2(Bucket='my-bucket', Delimiter='/')

for prefix_dict in response['CommonPrefixes']:
    prefix = prefix_dict['Prefix']
    if prefix < comparison_time_string}:
        print(prefix) 

However, be careful about the time definitions. Depending on where you run the code, the timezone might (or might not) be set to UTC. This might, or might not, match whatever is generating those dates and times on the folder names.

Update: Here's another version that looks for the date string in any part of the Key, then outputs the Key up to the folder name.

import boto3
import datetime
import re

s3_client = boto3.client('s3')

now = datetime.datetime.now()
comparison_time = now - datetime.timedelta(days=1)
comparison_time_string = comparison_time.strftime("%Y%m%d_%H%M") # eg 20191225_0623

response = s3_client.list_objects_v2(Bucket='my-bucket')

pattern = re.compile('/([\d]{8}_[\d]{4})/') # eg /20191225_0623/

old_objects = []

for object in response['Contents']:
    key = object['Key']
    result = pattern.search(key)
    if result and result.group(1) < comparison_time_string:
        old_objects.append(key[:result.end()])

print(old_objects)

Upvotes: 3

Related Questions