Reputation: 11
I have uploaded 365 files 1 files per day to S3 bucket all at one go. Now All the files have the same upload date. I want to Move the file which are more than 6 months to S3 Glacier. S3 lifecycle policy will take effect after 6 months as all the files upload date to s3 is same. The actual date of the files upload is stored in DynamoDb table with S3KeyUrl. I want to know the best way to be able to move file to s3 Glacier. I came up with the following approach
Create the S3 Lifecycle policy to move file to s3 Glacier which will work after 6 month.
Create a app to Query DynamoDB Table to get the list of files which are more than 6 months and download the file from s3 (as it allows uploading files from local directory) and use ArchiveTransferManager (Amazon.Glacier.Transfer) to the file to s3 glacier vault.
Upvotes: 1
Views: 602
Reputation: 269091
There are two versions of Glacier:
Glacier
and Glacier Deep Archive
Trust me... You do not want to use the 'original' Glacier. It is slow and difficult to use. So, avoid anything that mentions Vaults and Archives.
Instead, you simply want to change the Storage Class of the objects in Amazon S3.
Normally, the easiest way to do this is to "Edit storage class" in the S3 management console. However, you mention Millions of objects, so this wouldn't be feasible.
Instead, you will need to copy objects over themselves, while changing the storage class. This can be done with the AWS CLI:
aws s3 cp s3://<bucket-name>/ s3://<bucket-name>/ --recursive --storage-class <storage_class>
Note that this would change the storage class for all objects in the given bucket/path. Since you only wish to selectively change the storage class, you would either need to issue lots of the above commands (each for only one object), or you could use an AWS SDK to script the process. For example, you could write a Python program that loops through the list of objects, checks DynamoDB to determine whether the object is '6 months old' and then copies it over itself with the new Storage Class.
See: StackOverflow: How to change storage class of existing key via boto3
If you have millions of objects, it can take a long time to merely list the objects. Therefore, you could consider using Amazon S3 Inventory, which can provide a daily or weekly CSV file listing all objects. You could then use this CSV file as the 'input list' for your 'copy' operation rather than having to list the bucket itself.
Or, just be lazy (which is always more productive!) and archive everything to Glacier. Then, if somebody actually needs one of the files in the next 6 months, simply restore it from Glacier before use. So simple!
Upvotes: 2