ezamur
ezamur

Reputation: 2182

AWS Providing access to assumed role from another account to access S3 in my account

What I am trying to achieve is to copy objects from S3 in one account (A1 - not controlled by me) into S3 in another account (A2 - controlled by me). For that OPS from A1 provided me a role I can assume, using boto3 library.

session = boto3.Session()
sts_client = session.client('sts')

assumed_role = sts_client.assume_role(
    RoleArn="arn:aws:iam::1234567890123:role/organization",
    RoleSessionName="blahblahblah"
)

This part is ok. Problem is that direct copy from S3 to S3 is failing because that assumed role cannot access my S3.

s3 = boto3.resource('s3')
copy_source = {
    'Bucket': a1_bucket_name,
    'Key': key_name
}

bucket = s3.Bucket(a2_bucket_name)
bucket.copy(copy_source, hardcoded_key)

As a result of this I get

botocore.exceptions.ClientError: An error occurred (403) when calling the HeadObject operation: Forbidden

in this line of code:

bucket.copy(copy_source, hardcoded_key)

Is there any way I can grant access to my S3 for that assumed role? I would really like to have direct S3 to S3 copy without downloading file locally before uploading it again.

Please advise if there is a better approach than this.

Idea is to have this script running inside of a AWS Data Pipeline on daily basis for example.

Upvotes: 2

Views: 1775

Answers (2)

Matt Houser
Matt Houser

Reputation: 36073

To copy objects from one S3 bucket to another S3 bucket, you need to use one set of AWS credentials that has access to both buckets.

If those buckets are in different AWS accounts, you need 2 things:

  1. Credentials for the target bucket, and
  2. A bucket policy on the source bucket allowing read access to the target AWS account.

With these alone, you can copy objects. You do not need credentials for the source account.

  1. Add a bucket policy to your source bucket allowing read access to the target AWS account.

Here is a sample policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "DelegateS3Access",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::123456789012:root"
            },
            "Action": "s3:*",
            "Resource": [
                "arn:aws:s3:::BUCKET_NAME",
                "arn:aws:s3:::BUCKET_NAME/*"
            ]
        }
    ]
}

Be sure to replace BUCKET_NAME with your source bucket name. And replace 123456789012 with your target AWS account number.

  1. Using credentials for your target AWS account (the owner of the target bucket), perform the copy.

Additional Notes:

You can also copy objects by reversing the two requirements:

  1. Credentials for the source AWS account, and
  2. A bucket policy on the target bucket allowing write access to the source AWS account.

However, when done this way, object metadata does not get copied correctly. I have discussed this issue with AWS Support, and they recommend reading from the foreign account rather than writing to the foreign account to avoid this problem.

Upvotes: 2

Piyush Patil
Piyush Patil

Reputation: 14533

This is a sample code to transfer data between two S3 buckets with 2 different AWS account using boto 3.

from boto.s3.connection import S3Connection
from boto.s3.key import Key
from Queue import LifoQueue
import threading

source_aws_key = '*******************'
source_aws_secret_key = '*******************'
dest_aws_key = '*******************'
dest_aws_secret_key = '*******************'
srcBucketName = '*******************'
dstBucketName = '*******************'

class Worker(threading.Thread):
    def __init__(self, queue):
        threading.Thread.__init__(self)
        self.source_conn = S3Connection(source_aws_key, source_aws_secret_key)
        self.dest_conn = S3Connection(dest_aws_key, dest_aws_secret_key)
        self.srcBucket = self.source_conn.get_bucket(srcBucketName)
        self.dstBucket = self.dest_conn.get_bucket(dstBucketName)
        self.queue = queue

    def run(self):
        while True:
            key_name = self.queue.get()
            k = Key(self.srcBucket, key_name)
            dist_key = Key(self.dstBucket, k.key)
            if not dist_key.exists() or k.etag != dist_key.etag:
                print 'copy: ' + k.key
                self.dstBucket.copy_key(k.key, srcBucketName, k.key, storage_class=k.storage_class)
            else:
                print 'exists and etag matches: ' + k.key

            self.queue.task_done()

def copyBucket(maxKeys = 1000):
    print 'start'

    s_conn = S3Connection(source_aws_key, source_aws_secret_key)
    srcBucket = s_conn.get_bucket(srcBucketName)

    resultMarker = ''
    q = LifoQueue(maxsize=5000)

    for i in range(10):
        print 'adding worker'
        t = Worker(q)
        t.daemon = True
        t.start()

    while True:
        print 'fetch next 1000, backlog currently at %i' % q.qsize()
        keys = srcBucket.get_all_keys(max_keys = maxKeys, marker = resultMarker)
        for k in keys:
            q.put(k.key)
        if len(keys) < maxKeys:
            print 'Done'
            break
        resultMarker = keys[maxKeys - 1].key

    q.join()
    print 'done'

if __name__ == "__main__":
    copyBucket()

Upvotes: 1

Related Questions