cloudbud
cloudbud

Reputation: 3278

Move files from one s3 bucket to another in AWS using AWS lambda

I am trying to move files older than a hour from one s3 bucket to another s3 bucket using python boto3 AWS lambda function with following cases:

  1. Both buckets can be in same account and different region.
  2. Both buckets can be in different account and different region.
  3. Both buckets can be in different account and same region.

I got some help to move files using the python code mentioned by @John Rotenstein

import boto3
from datetime import datetime, timedelta

SOURCE_BUCKET = 'bucket-a'
DESTINATION_BUCKET = 'bucket-b'

s3_client = boto3.client('s3')

# Create a reusable Paginator
paginator = s3_client.get_paginator('list_objects_v2')

# Create a PageIterator from the Paginator
page_iterator = paginator.paginate(Bucket=SOURCE_BUCKET)

# Loop through each object, looking for ones older than a given time period
for page in page_iterator:
    for object in page['Contents']:
        if object['LastModified'] < datetime.now().astimezone() - timedelta(hours=1):   # <-- Change time period here
            print(f"Moving {object['Key']}")

            # Copy object
            s3_client.copy_object(
                Bucket=DESTINATION_BUCKET,
                Key=object['Key'],
                CopySource={'Bucket':SOURCE_BUCKET, 'Key':object['Key']}
            )

            # Delete original object
            s3_client.delete_object(Bucket=SOURCE_BUCKET, Key=object['Key'])

How can this be modified to cater the requirement

Upvotes: 3

Views: 16083

Answers (2)

John Rotenstein
John Rotenstein

Reputation: 269340

An alternate approach would be to use Amazon S3 Replication, which can replicate bucket contents:

  • Within the same region, or between regions
  • Within the same AWS Account, or between different Accounts

Replication is frequently used when organizations need another copy of their data in a different region, or simply for backup purposes. For example, critical company information can be replicated to another AWS Account that is not accessible to normal users. This way, if some data was deleted, there is another copy of it elsewhere.

Replication requires versioning to be activated on both the source and destination buckets. If you require encryption, use standard Amazon S3 encryption options. The data will also be encrypted during transit.

You configure a source bucket and a destination bucket, then specify which objects to replicate by providing a prefix or a tag. Objects will only be replicated once Replication is activated. Existing objects will not be copied. Deletion is intentionally not replicated to avoid malicious actions. See: What Does Amazon S3 Replicate?

There is no "additional" cost for S3 replication, but you will still be charge for any Data Transfer charges when moving objects between regions, and for API Requests (that are tiny charges), plus storage of course.

Upvotes: 3

John Rotenstein
John Rotenstein

Reputation: 269340

Moving between regions

This is a non-issue. You can just copy the object between buckets and Amazon S3 will figure it out.

Moving between accounts

This is a bit harder because the code will use a single set of credentials must have ListBucket and GetObject access on the source bucket, plus PutObject rights to the destination bucket.

Also, if credentials are being used from the Source account, then the copy must be performed with ACL='bucket-owner-full-control' otherwise the Destination account won't have access rights to the object. This is not required when the copy is being performed with credentials from the Destination account.

Let's say that the Lambda code is running in Account-A and is copying an object to Account-B. An IAM Role (Role-A) is assigned to the Lambda function. It's pretty easy to give Role-A access to the buckets in Account-A. However, the Lambda function will need permissions to PutObject in the bucket (Bucket-B) in Account-B. Therefore, you'll need to add a bucket policy to Bucket-B that allows Role-A to PutObject into the bucket. This way, Role-A has permission to read from Bucket-A and write to Bucket-B.

So, putting it all together:

  • Create an IAM Role (Role-A) for the Lambda function
  • Give the role Read/Write access as necessary for buckets in the same account
  • For buckets in other accounts, add a Bucket Policy that grants the necessary access permissions to the IAM Role (Role-A)
  • In the copy_object() command, include ACL='bucket-owner-full-control' (this is the only coding change needed)
  • Don't worry about doing any for cross-region, it should just work automatically

Upvotes: 3

Related Questions