Hyder Tom
Hyder Tom

Reputation: 373

IAM role and Keys setup for S3 AWS accessing two different account buckets using boto3

I have two different accounts 1) Account one which is vendor account and they gave us AccessID and secret key for access. 2) Our Account where we have full access.

We need to copy files from Vendor S3 bucket to Our S3 bucket using boto3 Python 3.7 scripts.

What is the best function in boto3 to use to get best performance.

I tried using get_object and put_object. Problem with this scenario is I am actually reading the file body and writing it. How do we just copy from one account to another account with the faster copy mode?

Is there any setup I can do from my end to directly copy. We are okay to use Lambda as well as long as I get good performance. I cannot request any changes from vendor except that they give us access keys.

Thanks Tom

Upvotes: 1

Views: 721

Answers (1)

nicor88
nicor88

Reputation: 1553

One of the fastest ways to copy data between 2 buckets is to use S3DistCp, worth to use it only if you have a lot of files to copy, it will copy them in a distributed way with an EMR cluster. Lambda function with boto3 will be an option, only if copy takes less then 5 minutes if longer you can consider using ECS tasks (basically Docker containers).

Regarding the part how to copy with boto3 you can check here. Looks like that you can do something like:

import boto3

s3_client = boto3.client('s3')
s3_resource = boto3.resource('s3')

source_bucket_name = 'src_bucket_name'
destination_bucket_name = 'dst_bucket_name'

paginator = s3_client.get_paginator('list_objects')
response_iterator = paginator.paginate(
    Bucket=source_bucket_name,
    Prefix='your_prefix',
    PaginationConfig={
        'PageSize': 1000,
    }
)
objs = response_iterator.build_full_result()['Contents']

keys_to_copy = [o['Key'] for o in objs] # or use a generator (o['Key'] for o in objs)

for key in keys_to_copy:
    print(key)
    copy_source = {
        'Bucket': source_bucket_name,
        'Key': key
    }
    s3_resource.meta.client.copy(copy_source, destination_bucket_name, key)

The proposed solution first get the name of the objects to copy, then it calls the copy command for each object. To make it faster instead of using a for loop, you can use async.

If you run the code in a Lambda or ECS task remember to create a IAM role with access to both Source Bucket and Destination bucket.

Upvotes: 1

Related Questions