user3055034
user3055034

Reputation: 643

Copy files from S3 bucket to local machine using file index

I need to copy a files from many subdirectories in an S3 bucket to my local machine. The file name is auto generated and would be difficult to obtain without first using ls, but I do know that the target file is always the 2nd file in the subfolder by date creation order.

Is there a way to reference a file the in the s3 bucket subfolder file by index?

I am envisioning doing this with aws cli, though I'm open to other suggestions.

Upvotes: 0

Views: 1267

Answers (2)

John Rotenstein
John Rotenstein

Reputation: 269340

You could use this method to obtain the name of the second file in a given bucket/path:

aws s3api list-objects-v2 --bucket BUCKET-NAME --query 'Contents[1].Key' --output text

This would also work with BUCKET-NAME/PATH.

However, you mention that you have many subdirectories, so you would have to know the names of all those subdirectories if you are wanting to avoid doing a full bucket listing.

Upvotes: 1

jarmod
jarmod

Reputation: 78653

I'm not aware of any way within S3 to list the second oldest object without listing all objects at a given prefix and then explicitly sorting that list by date. If you need to do this then here are a few ideas:

  1. if objects are only ever added (never deleted), then you could perhaps use a key naming convention when objects are uploaded that allows you to easily locate the 2nd oldest object e.g 0001-xxx, 0002-xxx. Then you can find the 2nd oldest object by listing objects with prefix 0002.
  2. maintain an independent index of the objects in an RDBMS or KV database that allows you to easily locate the S3 key of the 2nd oldest object in any part of your S3 hierarchy. Possibly the DB is maintained via a Lambda function called when objects are put or deleted.
  3. use a Lambda function triggered on object PUT that enumerates all of the objects in the relevant 'folder' and writes the key of the 2nd oldest object back to a kind of index object in that same folder (or as metadata on a known index object). Then you can find the 2nd oldest by getting the contents of the index object (or its metadata).

Option #2 might be the best as it's simple, fast, and flexible (what if, as your app changes over time, you find that you also need to know the 4th oldest object, or the 2nd newest object).

Upvotes: 1

Related Questions