Reputation: 2182
What I am trying to achieve is to copy objects from S3 in one account (A1 - not controlled by me) into S3 in another account (A2 - controlled by me). For that OPS from A1 provided me a role I can assume, using boto3 library.
session = boto3.Session()
sts_client = session.client('sts')
assumed_role = sts_client.assume_role(
RoleArn="arn:aws:iam::1234567890123:role/organization",
RoleSessionName="blahblahblah"
)
This part is ok. Problem is that direct copy from S3 to S3 is failing because that assumed role cannot access my S3.
s3 = boto3.resource('s3')
copy_source = {
'Bucket': a1_bucket_name,
'Key': key_name
}
bucket = s3.Bucket(a2_bucket_name)
bucket.copy(copy_source, hardcoded_key)
As a result of this I get
botocore.exceptions.ClientError: An error occurred (403) when calling the HeadObject operation: Forbidden
in this line of code:
bucket.copy(copy_source, hardcoded_key)
Is there any way I can grant access to my S3 for that assumed role? I would really like to have direct S3 to S3 copy without downloading file locally before uploading it again.
Please advise if there is a better approach than this.
Idea is to have this script running inside of a AWS Data Pipeline on daily basis for example.
Upvotes: 2
Views: 1775
Reputation: 36073
To copy objects from one S3 bucket to another S3 bucket, you need to use one set of AWS credentials that has access to both buckets.
If those buckets are in different AWS accounts, you need 2 things:
With these alone, you can copy objects. You do not need credentials for the source account.
Here is a sample policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "DelegateS3Access",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::123456789012:root"
},
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::BUCKET_NAME",
"arn:aws:s3:::BUCKET_NAME/*"
]
}
]
}
Be sure to replace BUCKET_NAME
with your source bucket name. And replace 123456789012
with your target AWS account number.
Additional Notes:
You can also copy objects by reversing the two requirements:
However, when done this way, object metadata does not get copied correctly. I have discussed this issue with AWS Support, and they recommend reading from the foreign account rather than writing to the foreign account to avoid this problem.
Upvotes: 2
Reputation: 14533
This is a sample code to transfer data between two S3 buckets with 2 different AWS account using boto 3.
from boto.s3.connection import S3Connection
from boto.s3.key import Key
from Queue import LifoQueue
import threading
source_aws_key = '*******************'
source_aws_secret_key = '*******************'
dest_aws_key = '*******************'
dest_aws_secret_key = '*******************'
srcBucketName = '*******************'
dstBucketName = '*******************'
class Worker(threading.Thread):
def __init__(self, queue):
threading.Thread.__init__(self)
self.source_conn = S3Connection(source_aws_key, source_aws_secret_key)
self.dest_conn = S3Connection(dest_aws_key, dest_aws_secret_key)
self.srcBucket = self.source_conn.get_bucket(srcBucketName)
self.dstBucket = self.dest_conn.get_bucket(dstBucketName)
self.queue = queue
def run(self):
while True:
key_name = self.queue.get()
k = Key(self.srcBucket, key_name)
dist_key = Key(self.dstBucket, k.key)
if not dist_key.exists() or k.etag != dist_key.etag:
print 'copy: ' + k.key
self.dstBucket.copy_key(k.key, srcBucketName, k.key, storage_class=k.storage_class)
else:
print 'exists and etag matches: ' + k.key
self.queue.task_done()
def copyBucket(maxKeys = 1000):
print 'start'
s_conn = S3Connection(source_aws_key, source_aws_secret_key)
srcBucket = s_conn.get_bucket(srcBucketName)
resultMarker = ''
q = LifoQueue(maxsize=5000)
for i in range(10):
print 'adding worker'
t = Worker(q)
t.daemon = True
t.start()
while True:
print 'fetch next 1000, backlog currently at %i' % q.qsize()
keys = srcBucket.get_all_keys(max_keys = maxKeys, marker = resultMarker)
for k in keys:
q.put(k.key)
if len(keys) < maxKeys:
print 'Done'
break
resultMarker = keys[maxKeys - 1].key
q.join()
print 'done'
if __name__ == "__main__":
copyBucket()
Upvotes: 1