Reputation: 4575
I have a bunch of files in an s3 bucket with prefixes like the example below. I would like to connect with boto3 and create a list of all prefixes in the bucket that have a date part older than a day. So for example if the current date was
'20191226_1213'
then I would like to create a list like the desired output below. Can anyone suggest how to do this with boto3?
example:
's3://basepath/20191225_1217/'
's3://basepath/20191224_1012/'
's3://basepath/20191222_1114/'
desired output:
['s3://basepath/20191224_1012/','s3://basepath/20191222_1114/']
update:
I'm sorry I didn't provide a better example before but my real folder path actually looks like:
's3://basepath/folder1/20191225_1217/'
Upvotes: 1
Views: 735
Reputation: 269284
Here's some code that extracts the Common Prefix in the root of the given bucket and checks their names against "one day ago":
import boto3
import datetime
s3_client = boto3.client('s3')
now = datetime.datetime.now()
comparison_time = now - datetime.timedelta(days=1)
comparison_time_string = comparison_time.strftime("%Y%m%d_%H%M") # eg 20191225_0623
response = s3_client.list_objects_v2(Bucket='my-bucket', Delimiter='/')
for prefix_dict in response['CommonPrefixes']:
prefix = prefix_dict['Prefix']
if prefix < comparison_time_string}:
print(prefix)
However, be careful about the time definitions. Depending on where you run the code, the timezone might (or might not) be set to UTC. This might, or might not, match whatever is generating those dates and times on the folder names.
Update: Here's another version that looks for the date string in any part of the Key, then outputs the Key up to the folder name.
import boto3
import datetime
import re
s3_client = boto3.client('s3')
now = datetime.datetime.now()
comparison_time = now - datetime.timedelta(days=1)
comparison_time_string = comparison_time.strftime("%Y%m%d_%H%M") # eg 20191225_0623
response = s3_client.list_objects_v2(Bucket='my-bucket')
pattern = re.compile('/([\d]{8}_[\d]{4})/') # eg /20191225_0623/
old_objects = []
for object in response['Contents']:
key = object['Key']
result = pattern.search(key)
if result and result.group(1) < comparison_time_string:
old_objects.append(key[:result.end()])
print(old_objects)
Upvotes: 3