Reputation: 373
I have two different accounts 1) Account one which is vendor account and they gave us AccessID and secret key for access. 2) Our Account where we have full access.
We need to copy files from Vendor S3 bucket to Our S3 bucket using boto3 Python 3.7 scripts.
What is the best function in boto3 to use to get best performance.
I tried using get_object and put_object. Problem with this scenario is I am actually reading the file body and writing it. How do we just copy from one account to another account with the faster copy mode?
Is there any setup I can do from my end to directly copy. We are okay to use Lambda as well as long as I get good performance. I cannot request any changes from vendor except that they give us access keys.
Thanks Tom
Upvotes: 1
Views: 721
Reputation: 1553
One of the fastest ways to copy data between 2 buckets is to use S3DistCp, worth to use it only if you have a lot of files to copy, it will copy them in a distributed way with an EMR cluster. Lambda function with boto3 will be an option, only if copy takes less then 5 minutes if longer you can consider using ECS tasks (basically Docker containers).
Regarding the part how to copy with boto3 you can check here. Looks like that you can do something like:
import boto3
s3_client = boto3.client('s3')
s3_resource = boto3.resource('s3')
source_bucket_name = 'src_bucket_name'
destination_bucket_name = 'dst_bucket_name'
paginator = s3_client.get_paginator('list_objects')
response_iterator = paginator.paginate(
Bucket=source_bucket_name,
Prefix='your_prefix',
PaginationConfig={
'PageSize': 1000,
}
)
objs = response_iterator.build_full_result()['Contents']
keys_to_copy = [o['Key'] for o in objs] # or use a generator (o['Key'] for o in objs)
for key in keys_to_copy:
print(key)
copy_source = {
'Bucket': source_bucket_name,
'Key': key
}
s3_resource.meta.client.copy(copy_source, destination_bucket_name, key)
The proposed solution first get the name of the objects to copy, then it calls the copy command for each object. To make it faster instead of using a for loop, you can use async.
If you run the code in a Lambda or ECS task remember to create a IAM role with access to both Source Bucket and Destination bucket.
Upvotes: 1