MR.QUESTION
MR.QUESTION

Reputation: 359

Clone entire instance of S3 bucket to another bucket

So I need to clone entire instance via AWS SDK for Python (boto3). But my instance have more that 5 millions objects so calling objects_response = client.list_objects_v2(Bucket=bucket_name) recursivly and then perform a copy on each file is taking too much time and not secure due to process fail and starting over with that amount of files. So how to make it in a fast and more secure way?

Upvotes: 0

Views: 291

Answers (1)

John Rotenstein
John Rotenstein

Reputation: 270134

AWS CLI s3 sync

The AWS Command-Line Interface (CLI) aws s3 sync command can copy S3 objects in parallel.

You can adjust the settings to enable more simultaneous copies via the AWS CLI S3 Configuration file.

The sync command uses CopyObject() to copy objects, which tells S3 to copy the objects between buckets. Therefore, no data is downloaded/uploaded -- it just sends commands to S3 to manage the copy.

Running the sync command from an Amazon EC2 instance will reduce network latency, resulting in a faster copy (especially for many, smaller objects).

You could improve copy speed by running several copies of aws s3 sync (preferably from multiple computers). For example, each could be responsible for copying a separate sub-directory.

See also: Improve transfer performance of sync command in Amazon S3

Amazon S3 Batch Operations

Amazon S3 itself can also perform a Copy operation on a large number of files by using Amazon S3 Batch Operations:

  • First, create an Amazon S3 Inventory report that lists all the objects (or supply a CSV file with object names)
  • Then, create a S3 Batch Operations "Copy" job, pointing to the destination bucket

The entire process will be managed by Amazon S3.

Upvotes: 1

Related Questions