Reputation: 527
The main question I have is:
how can i move files based on a date range, without incurring client-side api calls that cost money?
Background:
I want to download a subset of files from an AWS S3 bucket onto a linux server but there are millions of them in ONE folder, with nothing differentiating them except a sequence number; and I need a subset of these based on creation date. (well actually, inside the files is an event timestamp, so I want to reduce the bulk first by creation date).
I have frankly no idea what costs I am incurring, everytime I do an ls on that dataset , e.g. for testing.
Right now I am considering:
aws s3api list-objects --bucket "${S3_BUCKET}" --prefix "${path_from}" --query "Contents[?LastModified>='${low_extract_date}'].{Key: Key}"
but that is client-side if I understand correctly. So I would like to just move the relevant files to a different folder first, based on creation date.
Then just run aws S3 ls on that set.
Is that possible?
Because in that case, I would either:
or:
or: some other way?
And: is that cheaper than listing the files using the query?
thanks!
PS so to clarify: i wish to do a server-side operation to reduce the set initially and then list the result.
Upvotes: 1
Views: 1169
Reputation: 35188
I believe a good approach to this would be the following:
Y/m/d
) e.g. prefix/randomfile.txt
might become 2020/07/04/randomfile.txt
. If you're planning on scrapping the rest of the files then move it to a new bucket rather than in the same bucket.2020/07
From the CLI you can move a file using the current syntax
aws s3 mv s3://bucketname/prefix/randomfile.txt s3://bucketname/2020/07/04/randomfile.txt
To copy the files for a specific prefix you could run the following on the CLI
aws s3 cp s3://bucketname/2020/07 .
To get files on a specific date you can run the below
aws s3api list-objects-v2 --bucket bucketname --query 'Contents[?contains(LastModified, `$DATE`)]'
The results of running this would need to be run via the CLI
Upvotes: 1