Reputation: 197
Is there a way to find all files that are older than 100 days in one S3 bucket and move them to a different bucket? Solutions using AWS CLI or SDK both welcome.
In the src bucket, the files are organized like bucket/type/year/month/day/hour/file
S3://my-logs-bucket/logtype/2020/04/30/16/logfile.csv
For instance, on 2020/04/30
, log files on or before 2020/01/21
will have to be moved.
Upvotes: 1
Views: 6605
Reputation: 742
Adding on from John's answer, if the objects are not in the root directory of the bucket then a few adjustments to the script need to be made. If they are in the root directory, use John's answer, this script will only work if the objects are in a sub-directory. This script moves objects from bucket/path/to/objects/ to bucket2/path/to/objects/ assuming you have access to each bucket from same set of aws cli credentials.
import boto3
from datetime import datetime, timedelta
SOURCE_BUCKET = 'bucket-a'
SOURCE_PATH = 'path/to/objects/'
DESTINATION_BUCKET = 'bucket-b'
DESTINATION_PATH = 'path/to/send/objects/' #<- you may need to add a prefix of the filenames to the end so that paginator doesn't look at the 'objects' directory
s3_client = boto3.client('s3')
# Create a reusable Paginator
paginator = s3_client.get_paginator('list_objects_v2')
# Create a PageIterator from the Paginator and include Prefix argument and optional PaginationConfig argument to control the number of objects you want to iterate over (incase you have a lot)
page_iterator = paginator.paginate(Bucket=SOURCE_BUCKET, Prefix=SOURCE_PATH, PaginationConfig={'MaxItems':10000})
# Loop through each object, looking for ones older than a given time period
for page in page_iterator:
for object in page.get("Contents", []):
if object['LastModified'] < datetime.now().astimezone() - timedelta(days=100): # <-- Change time period here
# grab filename from path/to/filename
FILENAME = object['Key'].rsplit('/',1)[1]
# Copy object
s3_client.copy_object(
Bucket=DESTINATION_BUCKET,
Key=DESTINATION_PATH+FILENAME,
CopySource={'Bucket':SOURCE_BUCKET, 'Key':object['Key']}
)
# Delete original object
s3_client.delete_object(Bucket=SOURCE_BUCKET, Key=object['Key'])
Upvotes: 0
Reputation: 269340
Here's some Python code that will:
Bucket-A
to Bucket-B
if they are older than a given periodimport boto3
from datetime import datetime, timedelta
SOURCE_BUCKET = 'bucket-a'
DESTINATION_BUCKET = 'bucket-b'
s3_client = boto3.client('s3')
# Create a reusable Paginator
paginator = s3_client.get_paginator('list_objects_v2')
# Create a PageIterator from the Paginator
page_iterator = paginator.paginate(Bucket=SOURCE_BUCKET)
# Loop through each object, looking for ones older than a given time period
for page in page_iterator:
for object in page['Contents']:
if object['LastModified'] < datetime.now().astimezone() - timedelta(days=2): # <-- Change time period here
print(f"Moving {object['Key']}")
# Copy object
s3_client.copy_object(
Bucket=DESTINATION_BUCKET,
Key=object['Key'],
CopySource={'Bucket':SOURCE_BUCKET, 'Key':object['Key']}
)
# Delete original object
s3_client.delete_object(Bucket=SOURCE_BUCKET, Key=object['Key'])
It worked for me, but please test it on less-important data before deploying in production since it deletes objects!
The code uses a paginator in case there are over 1000 objects in the bucket.
You can change the time period as desired.
(In addition to the license granted under the terms of service of this site the contents of this post are licensed under MIT-0.)
Upvotes: 8
Reputation:
As mentioned in my comments you can create a lifecycle policy for an S3 bucket. Here is steps to do it https://docs.aws.amazon.com/AmazonS3/latest/user-guide/create-lifecycle.html
It's optional to delete\expire an object using Lifecycle policy rules, you define the actions you want on the objects in your S3 bucket.
Lifecycle policies uses different storage classes to transition your objects. Before configuring Lifecycle policies I suggest reading up on the different storage classes as each have their own associated cost: Standard-IA, One Zone-IA, Glacier, and Deep Archive storage classes
Your use case of 100 days, I recommend transitioning your logs to a archive storage class such as S3 Glacier. This might prove to be more cost effective.
Upvotes: 1