Gal
Gal

Reputation: 719

Move files between two AWS S3 buckets using boto3

I have to move files between one bucket to another with Python Boto API. (I need it to "Cut" the file from the first Bucket and "Paste" it in the second one). What is the best way to do that?

** Note: Is that matter if I have two different ACCESS KEYS and SECRET KEYS?

Upvotes: 58

Views: 171748

Answers (14)

Cyrill22
Cyrill22

Reputation: 37

Using different accounts can be challenging. You need to create two different sessions two access both buckets simultaneously. If this is your case, you can do the following:

# create session for source account
sessionSource = boto3.Session(
    aws_access_key_id='ACCESS_KEY_SOURCE',
    aws_secret_access_key='SECRET_KEY_SOURCE',
    region_name='REGION'
)

# create session for target account
sessionTarget = boto3.Session(
    aws_access_key_id='ACCESS_KEY_TARGET',
    aws_secret_access_key='SECRET_KEY_TARGET',
    region_name='REGION'
)    

And then you can move the file by adding it to the new bucket and after that, by deleting it from the source account

fileToTransfer = source_client.get_object(
    Bucket="source-bucket-name",
    Key='file-key'
)

target_client.upload_fileobj(
    fileToTransfer['Body'],
    'target-bucket-name',
    'file-key',
)

source_client.delete_object(
    Bucket="source-bucket-name",
    Key='file-key'
)

Upvotes: 0

Shneor Elmaleh
Shneor Elmaleh

Reputation: 51

To move an object from one directory to another:

import boto3

def move_s3_object(bucket: str, old_key: str, new_key: str) -> None:
    boto3.resource('s3').Object(bucket,  new_key).copy_from(CopySource=f'{bucket}/{old_key}')
    boto3.client('s3').delete_object(Bucket=bucket, Key=old_key)


# example:
move_s3_object('my_bucket', old_key='tmp/test.txt', new_key='tmp/tmp2/test.txt')

This might even work with two different buckets, but I havent tested that.

Upvotes: 1

SV125
SV125

Reputation: 325

  1. On source AWS account, add this policy to the source S3 bucket:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:*"
            ],
            "Resource": [
                "arn:aws:s3:::SOURCE_BUCKET_NAME",
                "arn:aws:s3:::SOURCE_BUCKET_NAME/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:*"
            ],
            "Resource": [
                "arn:aws:s3:::DESTINATION_BUCKET_NAME",
                "arn:aws:s3:::DESTINATION_BUCKET_NAME/*"
            ]
        }
    ]
}
  1. Using the destination account's credentials:
boto3_session = boto3.Session(aws_access_key_id=<your access key>,
                              aws_secret_access_key=<your secret_access_key>)
s3_resource = boto3_session.resource('s3')
bucket = s3_resource.Bucket("<source bucket name>")

for obj in bucket.objects.all():
    obj_path = str(obj.key)

    copy_source = {
        'Bucket': "<source bucket name>",
        'Key': obj_path
    }
    s3_resource.meta.client.copy(copy_source, "<destination bucket name>", obj_path)

Upvotes: 0

Leonshi96
Leonshi96

Reputation: 1

I did this to move files between 2 S3 locations.

It handles the following scenario :

  • If you want to move files with specific prefixes in their names
  • If you want to move them between 2 subfolders within the same bucket
  • If you want to move them between 2 buckets
import boto3
s3 = boto3.resource('s3')

vBucketName = 'xyz-data-store'
#Source and Target Bucket Instantiation
vTargetBkt = s3.Bucket('xyz-data-store')
vSourceBkt = s3.Bucket('xyz-data-store')

#List of File name prefixes you want to move
vSourcePath = ['abc/1/test1_', 'abc/1/test2_'
               ,'abc/1/test3_','abc/1/test4_']
#List of Folder names you want the files to be moved to
vTargetPath = ['abc/1/test1_', 'abc/1/test2_'
               ,'abc/1/test3_','abc/1/test4_']

for (sP, tP) in zip(vSourcePath,vTargetPath) :
    for se_files in vSourceBkt.objects.filter(Prefix = sP, Delimiter = '/'):
        SourceFileName = (se_files.key).split('/')[-1]
        copy_source = {
            'Bucket': vSourceBkt.name,
            'Key': se_files.key
        }
        #print('SourceFileName ' + SourceFileName)
        #print('se_files ' + se_files.key)
        TargetFileName = str("{}{}".format(tP,SourceFileName))
        print('TargetFileName ' + TargetFileName)
        s3.meta.client.copy(copy_source, vBucketName, TargetFileName)
  
        #Delete files in the Source when the code is working

Upvotes: 0

Renu&#225; Meireles
Renu&#225; Meireles

Reputation: 51

Copy between different or same buckets can be easily done in boto3 by:

import boto3
s3 = boto3.resource('s3')
copy_source = {
    'Bucket': 'mybucket',
    'Key': 'mykey'
}
bucket = s3.Bucket('otherbucket')
bucket.copy(copy_source, 'otherkey')

# This is a managed transfer that will perform a multipart copy in
# multiple threads if necessary.

Upvotes: 4

Kumar Ashutosh
Kumar Ashutosh

Reputation: 1

It can be done easily with s3fs library.

import s3fs

src = 'source_bucket'
dst = 'destination_bucket'

s3 = s3fs.S3FileSystem(anon=False,key='aws_s3_key',secret='aws_s3_secret_key')

for i in s3.ls(src,refresh=True): # loading the file names
    if 'file_name' in i:          # checking the file name
        s3.mv(i,dst)              # moving file to destination

Here's the documentation. https://s3fs.readthedocs.io/en/latest/

Upvotes: -2

Mohamed Jaleel Nazir
Mohamed Jaleel Nazir

Reputation: 5821

Hope this answer will help, Thanks @agrawalramakant.

import boto3


# object_key = 'posts/0173c352-f9f8-4bf1-a818-c99b4c9b0c18.jpg'
def move_from_s3_to_s3(object_key):
    session_src = boto3.session.Session(aws_access_key_id="",
                                        region_name="ap-south-1",
                                        aws_secret_access_key="")

    source_s3_r = session_src.resource('s3')

    session_dest = boto3.session.Session(aws_access_key_id="",
                                         region_name="ap-south-1",
                                         aws_secret_access_key="")

    dest_s3_r = session_dest.resource('s3')
    # create a reference to source image
    old_obj = source_s3_r.Object('source_bucket_name', object_key)

    # create a reference for destination image
    new_obj = dest_s3_r.Object('dest_bucket_name', object_key)

    # upload the image to destination S3 object
    new_obj.put(Body=old_obj.get()['Body'].read())

Upvotes: 0

Freek Wiekmeijer
Freek Wiekmeijer

Reputation: 4940

I think the boto S3 documentation answers your question.

https://github.com/boto/boto/blob/develop/docs/source/s3_tut.rst

Moving files from one bucket to another via boto is effectively a copy of the keys from source to destination and then removing the key from source.

You can get access to the buckets:

import boto

c = boto.connect_s3()
src = c.get_bucket('my_source_bucket')
dst = c.get_bucket('my_destination_bucket')

and iterate the keys:

for k in src.list():
    # copy stuff to your destination here
    dst.copy_key(k.key.name, src.name, k.key.name)
    # then delete the source key
    k.delete()

See also: Is it possible to copy all files from one S3 bucket to another with s3cmd?

Upvotes: 41

Ganesh Kharad
Ganesh Kharad

Reputation: 341

This is code I used to move files within sub-directories of a s3 bucket

# =============================================================================
# CODE TO MOVE FILES within subfolders in S3 BUCKET
# =============================================================================

from boto3.session import Session

ACCESS_KEY = 'a_key'
SECRET_KEY = 's_key'
session = Session(aws_access_key_id=ACCESS_KEY,
            aws_secret_access_key=SECRET_KEY)
s3 = session.resource('s3')#creating session of S3 as resource


s3client = session.client('s3')

resp_dw = s3client.list_objects(Bucket='main_bucket', Prefix='sub_folder/', Delimiter="/")

forms2_dw = [x['Key'] for x in resp_dw['Contents'][1:]]#here we got all files list (max limit is 1000 at a time)
reload_no = 0
while len(forms2_dw) != 0 :

    #resp_dw = s3client.list_objects(Bucket='main_bucket', Prefix='sub_folder/', Delimiter="/")
    #with open('dw_bucket.json','w') as f:
    #    resp_dws =str(resp_dw)
       # f.write(json.dumps(resp_dws))
    #forms_dw = [x['Prefix'] for x in resp_dw['CommonPrefixes']] 
    #forms2_dw = [x['Key'] for x in resp_dw['Contents'][1:]]
    #forms2_dw[-1]
    total_files = len(forms2_dw)
    #i=0
    for i in range(total_files):
    #zip_filename='1819.zip'
        foldername = resp_dw['Contents'][1:][i]['LastModified'].strftime('%Y%m%d')#Put your logic here for folder name
        my_bcket   =  'main_bucket'

        my_file_old = resp_dw['Contents'][1:][i]['Key'] #file to be copied path
        zip_filename =my_file_old.split('/')[-1]
        subpath_nw='new_sub_folder/'+foldername+"/"+zip_filename #destination path
        my_file_new = subpath_nw
        # 
        print str(reload_no)+ ':::  copying from====:'+my_file_old+' to :====='+s3_archive_subpath_nw
        #print my_bcket+'/'+my_file_old 

        if zip_filename[-4:] == '.zip':
            s3.Object(my_bcket,my_file_new).copy_from(CopySource=my_bcket+'/'+my_file_old)
            s3.Object(my_bcket,my_file_old).delete()

            print str(i)+' files moved of '+str(total_files)

    resp_dw = s3client.list_objects(Bucket='main_bucket', Prefix='sub-folder/', Delimiter="/")

    forms2_dw = [x['Key'] for x in resp_dw['Contents'][1:]] 
    reload_no +=1 

Upvotes: -2

agrawalramakant
agrawalramakant

Reputation: 170

If you have 2 different buckets with different access credentials. Store the credentials accordingly in credentials and config files under ~/.aws folder.

you can use the following to copy object from one bucket with different credentials and then save the object in the other bucket with different credentials:

import boto3


session_src = boto3.session.Session(profile_name=<source_profile_name>)
source_s3_r = session_src.resource('s3')

session_dest = boto3.session.Session(profile_name=<dest_profile_name>)
dest_s3_r = session_dest.resource('s3')

# create a reference to source image
old_obj = source_s3_r.Object(<source_s3_bucket_name>, <prefix_path> + <key_name>)

# create a reference for destination image
new_obj = dest_s3_r.Object(<dest_s3_bucket_name>, old_obj.key)

# upload the image to destination S3 object
new_obj.put(Body=old_obj.get()['Body'].read())

Both bucket do not need to have accessibility from each other in the ACL or the bucket policies.

Upvotes: 13

Tom Wojcik
Tom Wojcik

Reputation: 6179

If you want to

Create a copy of an object that is already stored in Amazon S3.

then copy_object is the way to go in boto3.

How I do it:

import boto3

aws_access_key_id = ""
aws_secret_access_key = ""
bucket_from = ""
bucket_to = ""
s3 = boto3.resource(
    's3',
    aws_access_key_id=aws_access_key_id,
    aws_secret_access_key=aws_secret_access_key
)
src = s3.Bucket(bucket_from)

def move_files():
    for archive in src.objects.all():
        # filters on archive.key might be applied here

        s3.meta.client.copy_object(
            ACL='public-read',
            Bucket=bucket_to,
            CopySource={'Bucket': bucket_from, 'Key': archive.key},
            Key=archive.key
        )

move_files()

Upvotes: 6

David Arenburg
David Arenburg

Reputation: 92282

If you are using boto3 (the newer boto version) this is quite simple

import boto3
s3 = boto3.resource('s3')
copy_source = {
    'Bucket': 'mybucket',
    'Key': 'mykey'
}
s3.meta.client.copy(copy_source, 'otherbucket', 'otherkey')

(Docs)

Upvotes: 55

Artem Fedosov
Artem Fedosov

Reputation: 2203

awscli does the job 30 times faster for me than boto coping and deleting each key. Probably due to multithreading in awscli. If you still want to run it from your python script without calling shell commands from it, you may try something like this:

Install awscli python package:

sudo pip install awscli

And then it is as simple as this:

import os
if os.environ.get('LC_CTYPE', '') == 'UTF-8':
    os.environ['LC_CTYPE'] = 'en_US.UTF-8'

from awscli.clidriver import create_clidriver
driver = create_clidriver()
driver.main('s3 mv source_bucket target_bucket --recursive'.split())

Upvotes: 12

SathishVenkat
SathishVenkat

Reputation: 99

Bucket name must be string not bucket object. Below change worked for me

for k in src.list():
    dst.copy_key(k.key, src.name, k.key)

Upvotes: 3

Related Questions