Emile
Emile

Reputation: 197

Move S3 files older than 100 days to another bucket

Is there a way to find all files that are older than 100 days in one S3 bucket and move them to a different bucket? Solutions using AWS CLI or SDK both welcome. In the src bucket, the files are organized like bucket/type/year/month/day/hour/file
S3://my-logs-bucket/logtype/2020/04/30/16/logfile.csv
For instance, on 2020/04/30, log files on or before 2020/01/21 will have to be moved.

Upvotes: 1

Views: 6605

Answers (3)

nealous3
nealous3

Reputation: 742

Adding on from John's answer, if the objects are not in the root directory of the bucket then a few adjustments to the script need to be made. If they are in the root directory, use John's answer, this script will only work if the objects are in a sub-directory. This script moves objects from bucket/path/to/objects/ to bucket2/path/to/objects/ assuming you have access to each bucket from same set of aws cli credentials.

import boto3
from datetime import datetime, timedelta

SOURCE_BUCKET = 'bucket-a'
SOURCE_PATH = 'path/to/objects/'
DESTINATION_BUCKET = 'bucket-b'
DESTINATION_PATH = 'path/to/send/objects/' #<- you may need to add a prefix of the filenames to the end so that paginator doesn't look at the 'objects' directory

s3_client = boto3.client('s3')

# Create a reusable Paginator
paginator = s3_client.get_paginator('list_objects_v2')

# Create a PageIterator from the Paginator and include Prefix argument and optional PaginationConfig argument to control the number of objects you want to iterate over (incase you have a lot)
page_iterator = paginator.paginate(Bucket=SOURCE_BUCKET, Prefix=SOURCE_PATH, PaginationConfig={'MaxItems':10000})

# Loop through each object, looking for ones older than a given time period
for page in page_iterator:
    for object in page.get("Contents", []):
        if object['LastModified'] < datetime.now().astimezone() - timedelta(days=100):   # <-- Change time period here
            
            # grab filename from path/to/filename
            FILENAME = object['Key'].rsplit('/',1)[1]

            # Copy object
            s3_client.copy_object(
                Bucket=DESTINATION_BUCKET,
                Key=DESTINATION_PATH+FILENAME,
                CopySource={'Bucket':SOURCE_BUCKET, 'Key':object['Key']}
            )

            # Delete original object
            s3_client.delete_object(Bucket=SOURCE_BUCKET, Key=object['Key'])

Upvotes: 0

John Rotenstein
John Rotenstein

Reputation: 269340

Here's some Python code that will:

  • Move files from Bucket-A to Bucket-B if they are older than a given period
  • Full names and paths will be retained
import boto3
from datetime import datetime, timedelta

SOURCE_BUCKET = 'bucket-a'
DESTINATION_BUCKET = 'bucket-b'

s3_client = boto3.client('s3')

# Create a reusable Paginator
paginator = s3_client.get_paginator('list_objects_v2')

# Create a PageIterator from the Paginator
page_iterator = paginator.paginate(Bucket=SOURCE_BUCKET)

# Loop through each object, looking for ones older than a given time period
for page in page_iterator:
    for object in page['Contents']:
        if object['LastModified'] < datetime.now().astimezone() - timedelta(days=2):   # <-- Change time period here
            print(f"Moving {object['Key']}")

            # Copy object
            s3_client.copy_object(
                Bucket=DESTINATION_BUCKET,
                Key=object['Key'],
                CopySource={'Bucket':SOURCE_BUCKET, 'Key':object['Key']}
            )

            # Delete original object
            s3_client.delete_object(Bucket=SOURCE_BUCKET, Key=object['Key'])

It worked for me, but please test it on less-important data before deploying in production since it deletes objects!

The code uses a paginator in case there are over 1000 objects in the bucket.

You can change the time period as desired.

(In addition to the license granted under the terms of service of this site the contents of this post are licensed under MIT-0.)

Upvotes: 8

user7548672
user7548672

Reputation:

As mentioned in my comments you can create a lifecycle policy for an S3 bucket. Here is steps to do it https://docs.aws.amazon.com/AmazonS3/latest/user-guide/create-lifecycle.html

It's optional to delete\expire an object using Lifecycle policy rules, you define the actions you want on the objects in your S3 bucket.

Lifecycle policies uses different storage classes to transition your objects. Before configuring Lifecycle policies I suggest reading up on the different storage classes as each have their own associated cost: Standard-IA, One Zone-IA, Glacier, and Deep Archive storage classes

Your use case of 100 days, I recommend transitioning your logs to a archive storage class such as S3 Glacier. This might prove to be more cost effective.

Upvotes: 1

Related Questions