Storing S3 Urls vs calling listObjects

Question

I have an app that has an attachments feature for users. They can upload documents to S3 and then revisit and preview and/or Download said attachments.

I was planning on storing the S3 urls in DB and then pre-signing them when the User needs them. I'm finding a caveat here is that this can lead to edge cases between S3 and the DB.

I.e. if a file gets removed from S3 but its url does not get removed from DB (or vice-versa). This can lead to data inconsistency and may mislead users.

I was thinking of just getting the urls via the network by using listObjects in the s3 client SDK. I don't really need to store the urls and this guarantees the user gets what's actually in S3.

Only con here is that it makes 1 API request (as opposed to DB hit)

Any insights?

Thanks!

John Rotenstein · Accepted Answer

Using a database to store an index to files is a good idea, especially once the volume of objects increases. The ListObjects() API only returns 1000 objects per call. This might be okay if every user has their own path (so you can use ListObjects(Prefix='user1/'), but that's not ideal if you want to allow document sharing between users.

Using a database will definitely be faster to obtain a listing, and it has the advantage that you can filter on attributes and metadata.

The two systems will only get "out of sync" if objects are created/deleted outside of your app, or if there is an error in the app. If this concerns you, then use Amazon S3 Inventory, to provide a regular listing of objects in the bucket and write some code to compare it against the database entries. This will highlight if anything is going wrong.

While Amazon S3 is an excellent NoSQL database (Key = filename, Value = contents), it isn't good for searching/listing a large quantity of objects.

Storing S3 Urls vs calling listObjects

Answers (1)

Related Questions