Reputation: 359
So I need to clone entire instance via AWS SDK for Python (boto3). But my instance have more that 5 millions objects so calling objects_response = client.list_objects_v2(Bucket=bucket_name)
recursivly and then perform a copy on each file is taking too much time and not secure due to process fail and starting over with that amount of files. So how to make it in a fast and more secure way?
Upvotes: 0
Views: 291
Reputation: 270134
AWS CLI s3 sync
The AWS Command-Line Interface (CLI) aws s3 sync
command can copy S3 objects in parallel.
You can adjust the settings to enable more simultaneous copies via the AWS CLI S3 Configuration file.
The sync
command uses CopyObject()
to copy objects, which tells S3 to copy the objects between buckets. Therefore, no data is downloaded/uploaded -- it just sends commands to S3 to manage the copy.
Running the sync
command from an Amazon EC2 instance will reduce network latency, resulting in a faster copy (especially for many, smaller objects).
You could improve copy speed by running several copies of aws s3 sync
(preferably from multiple computers). For example, each could be responsible for copying a separate sub-directory.
See also: Improve transfer performance of sync command in Amazon S3
Amazon S3 Batch Operations
Amazon S3 itself can also perform a Copy operation on a large number of files by using Amazon S3 Batch Operations:
The entire process will be managed by Amazon S3.
Upvotes: 1