Walter
Walter

Reputation: 11

AWS S3 data transfer using AWS CLI

I am trying to transfer 25 tb of data stored in s3 from one aws account to another s3 bucket in another AWS account (Both where in different regions) using AWS CLI, can anyone suggest me which EC2 instance is better to use and process for data transfer with CLI and mainly how much time it may take to complete the transfer.

Upvotes: 0

Views: 885

Answers (1)

John Rotenstein
John Rotenstein

Reputation: 269101

Copying files

Copying is the easy part! Use the AWS Command-Line Interface (CLI):

aws s3 sync s3://source-bucket s3://destination-bucket

The data will be transferred directly between the buckets - the data will not be downloaded & uploaded. Therefore, it doesn't matter what size EC2 instance you use -- you can even run the command from your own computer and will be just as fast. The CLI will send the necessary Copy commands to S3 for each file to be copied.

Using the sync command has the benefit that the copy can be resumed if something goes wrong, since it only copies files that are missing or updated since the previous sync.

Permissions

What you will need to consider is how to permit access to copy the files. Let's say you have:

  • Account A with Bucket A
  • Account B with Bucket B
  • You wish to copy from Bucket A to Bucket B

You should run the sync command from a user ("User B") in Account B that has permissions to write to Bucket B.

You will also need to add a Bucket Policy to Bucket A that specifically permits access by User B. The policy would look something like:

{
  "Id": "Policy1",
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "ReadOnlyAccess",
      "Action": [
        "s3:GetObject",
        "s3:ListBucket"
      ],
      "Effect": "Allow",
      "Resource": "arn:aws:s3:::my-bucket/*",
      "Principal": {
        "AWS": [
          "arn:aws:iam::123456789012:user/user-b"
        ]
      }
    }
  ]
}

The arn value is the ARN of User B.

Timing

The transfer will be faster if the buckets are in the same region. However, I have no idea how long the transfer will take. 25TB is actually a lot of data! (Have you ever tried copying 1TB of data on a computer? It is slow!)

The nice thing is that you can use the aws s3 sync command multiple times. Let's say you need the transfer to happen over a weekend. You could run the command during the week, and then run it again on the weekend. Only files that have been added/changed would be copied, so the final copy window would be quite small.

Upvotes: 2

Related Questions