Amazon S3, how to deal with the delay from Upload to Object availability

The app I'm building allows a user to upload a file. The file is uploaded to Amazon S3 in a private bucket.

Then users can download the file, which we allow by creating a time expiring URL:

AWS::S3::S3Object.url_for(attachment.path(style || attachment.default_style), attachment.bucket_name, :expires_in => expires_in, :use_ssl => true)

The problem we're having is that there is a short delay from upload to availability via the AWS::S3::S3Object.url_for. If users try to download the file right after the upload, Amazon errors with:

215412-NameError (uninitialized constant Attachment::AWS):
215413-  app/models/attachment.rb:32:in `authenticated_url'
215414-  app/controllers/attachments_controller.rb:33:in `show'

Any ideas on how to optimize, deal with this delay?

Thanks

Upvotes: 7

Answers (3)

Timo

Reputation: 271

Know the S3 Consistency Model

New objects:

S3 has strong read-after-write consistency for new objects in all regions. Meaning if you’ve never GET or HEAD that key before, you can immediately read it after a successful PUT.

Overwrites or deletes:

S3 offers strong consistency for overwrites unless you previously performed a GET or HEAD on that object within the last minute (due to internal caching). In that case, you might see “eventual consistency,” where a subsequent read could briefly return the old version. So if you’re overwriting an existing object key that you’ve recently read, you could see a short delay (seconds) before the new data is fully visible.

Easiest Workaround:

Use a New Key Each Time Instead of overwriting the same S3 key, use a unique key for each new upload. This sidesteps the overwrite consistency issue entirely.

For example:

const Key = `${filename}-${Date.now()}.${fileExtension}`;

Pros

Guaranteed immediate availability (since it’s always a new object). S3’s strong consistency for new objects means no poll needed.

Cons

You may accumulate multiple versions/keys in S3. You’ll have to clean up older ones if you only want the latest.

Ref: https://aws.amazon.com/s3/consistency/

Upvotes: 0

Igor.K

Reputation: 264

I know it's been years, but for those who came here with the same issue, here is what I've found.

First of all, it's just how AWS S3 works:

A process writes a new object to Amazon S3 and immediately lists keys within its bucket. Until the change is fully propagated, the object might not appear in the list.

The best way I have found to deal with this behaviour would be to wait while an uploaded object appears in the list before allow users to download it.

Something like:

_put_object(filename)
while True:
    if _file_exists(filename):
        break
    time.sleep(1)

To check availability we can use client.head_object or client.list_objects_v2.

There is an opinion that list_objects_v2 works faster

Upvotes: 5

Andrew Kuklewicz

Reputation: 10701

How long of a delay are you seeing? How often is this happening?

We upload directly to s3 from the browser using https://github.com/PRX/s3-swf-upload-plugin , and by the time I get a callback that the file exists, I have never seen an error with it being not yet available.

Another thing we do is to mark the object to one state on first upload, then use an asycnh process to validate the file, and only after it is marked valid do we go ahead and process it. This causes a delay however, so it may not be such a great answer for you.

Upvotes: 2

Amazon S3, how to deal with the delay from Upload to Object availability

Answers (3)

Related Questions