Reputation: 471
Currently, I have 30 Million files in one folder in an S3 bucket I want to move 7.5 million files from it into 4 folders in an S3 bucket
I tried out with the AWS CLI command but no idea how to mention the number of files in it
aws s3 mv s3://BUCKETNAME/myfolder/ s3://BUCKETNAME/folder1/ --recursive
How can I loop and move only 7.5 million files into each folder?
import boto3
aws_access_key_id = ""
aws_secret_access_key = ""
bucket_from = ""
bucket_to = ""
s3 = boto3.resource(
's3',
aws_access_key_id=aws_access_key_id,
aws_secret_access_key=aws_secret_access_key
)
src = s3.Bucket(bucket_from)
def move_files():
for archive in src.objects.all():
s3.meta.client.copy_object(
ACL='public-read',
Bucket=bucket_to,
CopySource={'Bucket': bucket_from, 'Key': archive.key},
Key=archive.key
)
move_files()
Upvotes: 2
Views: 2596
Reputation: 269091
I would recommend:
1. Obtain object listing using Amazon S3 Inventory
Listing millions of objects can take a long time. Instead, use Amazon S3 Inventory, which can provide a daily or weekly CSV file listing all objects.
This will provide you with a definitive list of current objects.
2. Split into 4 lists
Use a text editor to split the file list into 4 separate files -- one for each of your destination folders.
3. Use Amazon S3 Batch Operations to copy objects
Copying millions of objects would take a long time unless you multi-thread the process.
The easier and faster method would be to Perform large-scale batch operations on Amazon S3 objects using S3 Batch Operations. It can take the S3 Inventory file as input and then perform all the copy operations for you in parallel.
4. Clean-up
I recommend that you do not delete the source files until you are sure that all the copying was done correctly. You can again use S3 Inventory to obtain a list for comparison purposes.
Once you want to delete the source files, you can use S3 Lifecycle to delete the original objects. Be very careful that you do not delete the copied objects at the same time!! For this reason alone it might be better to copy the objects to a different bucket from the source files.
Upvotes: 1