anaz8
anaz8

Reputation: 125

Rename files while copying files between cross account s3 buckets

I am copying multiple parquet files between cross account s3 buckets. When I am copying them to the destination bucket I want to rename the files.

import boto3
s3_client = boto3.client('s3')
s3_resource = boto3.resource('s3')

bucket = 'sourcebucket'
folder_path = 'source_folder/'

resp = s3_client.list_objects(Bucket=bucket, Prefix=folder_path)
keys = []
for obj in resp['Contents']:
    keys.append(obj['Key'])


for key in keys:
    copy_source ={
        'Bucket': 'sourcebucket',
        'Key': key
    }
    file_name = key.split('/')[-1]
     s3_file = 'dest_folder/'+'xyz'+file_name
    bucketdest = s3_resource.Bucket('destinationbucket')
    bucketdest.copy(copy_source,s3_file,ExtraArgs={'GrantFullControl':'id = " "'})

This is what I have tried. I can see the files in my destination bucket with the new name but they have no actual data.

Thanks!

Upvotes: 1

Views: 662

Answers (1)

John Rotenstein
John Rotenstein

Reputation: 270154

Your code is working perfectly fine for me! (However, I ran it without the ExtraArgs since I didn't have an ID.)

When I copy objects between buckets, the rules I use are:

  • If possible, 'pull' the files from the different account
  • If 'pushing' files to another account, I set ExtraArgs={'ACL':'bucket-owner-full-control'}

I doubt this small change would have impacted the contents of the your objects.

By the way, it might be a good idea to use either Client methods or Resource methods. Mixing them can lead to confusion in code and potential problems.

So, you could use something like:

Client method:

response = s3_client.list_objects(Bucket=bucket, Prefix=source_prefix)

for object in response['Contents']:
    copy_source ={
        'Bucket': source_bucket,
        'Key': object['Key']
    }
    s3_client.copy_object(
        Bucket = target_bucket,
        Key = 'dest_folder/' + 'xyz' + key.split('/')[-1],
        CopySource = copy_source,
        ACL = 'bucket-owner-full-control'
    )

or you could use:

Resource method:

for object in s3_resource.Bucket(source_bucket).objects.Filter(Prefix=source_prefix):
    copy_source ={
        'Bucket': source_bucket,
        'Key': object.key
    }
    s3_resource.Bucket(target_bucket).copy(
        CopySource = copy_source,
        Key = 'dest_folder/' + 'xyz' + key.split('/')[-1],
        ExtraArgs={'ACL':'bucket-owner-full-control'}
    )

(Warning: I didn't test those snippets.)

Upvotes: 2

Related Questions