What is the recommended way to handle large file uploads to s3?

Question

I'm using AWS SDK for Ruby to upload large files from users to s3.

The server is a sinatra app with a POST /images endpoint accepting multipart/form-data. I'm experiencing a noticeable delay with user uploads. This is to be expected, because it's making a request to s3 synchronously. I wanted to move this to a background job using something like Sidekiq, but I'm not sure I like that solution.

I read online that some people are promoting direct uploads to s3 on the client side. Some even called this a "best practice." I'm hesitant to do this for several reasons:

My client side code would be heavily tied down to my cloud provider. I love AWS (great experiences), but I like to remain somewhat cloud-agnostic. I don't want my mobile and web apps to have to know the details of my AWS setup. If I choose to move away from s3 at a later date (unlikely but plausible), I would want this to be a seamless transition. Obviously, this works ok for a web app, because I can always redeploy quickly. However, I have to worry about mobile. Users may not update, and everything will become a lot more complicated if some users are uploading to s3 and some are uploading to another service.
Business logic regarding determining which bucket and region to use would need to either exist on the client side or I'd need to expose an endpoint for determining which bucket and region to use for each user. Then, I'd have to make a request to my server to figure out the parameters before I can begin uploading to s3. I want to be able to change buckets or re-route users to alternative regions and so I'm not a fan of this tight coupling or the additional request.
Security is a huge concern. When files are uploaded and processed through my server, I can utilize AWS IAM to properly ensure that these files are only coming from my server. I believe that I have to grant an "all-write" privilege to users which is problematic. If I use AWS IAM credentials in JavaScript, I do not see how you can ensure that users do not get unlimited write access to my bucket. All client side javascript, can be read by a user. In addition, I'm unaware of how to process validations. On my server, I can scan the files and determine whether or not to upload to s3. If I upload directly from the client, I would have to move this processing into lambda functions. I'm ok with that, but there is a chance the object could be retrieved by users before the processing has occurred. Then, I'd have to build some sort of locking system to prevent access before processing.

So, the bottom line is I have no idea where to go from here. I've hacked around some solutions, but I'm not thrilled with any of them. I'd love to learn how other startups and enterprises are tackling this kind of problem. What would you recommend? How would you counter my argument? Forgive me if I'm missing something, I'm still relatively an AWS-newbie.

strongjz · Accepted Answer

If you're worried about changing the post service I would suggest using an API and that way you can change the backed storage for your service. The mobile or web client would call the service and then your api would place the file where it needed to go. The api you have more control over and you could just created a signed s3 url to send to the client and let them still do the uploading.
An api, like in 1, solves this problem too, the client doesn't have to do all the work.
Use Simple Token Services and Temporary Security Credentials.

What is the recommended way to handle large file uploads to s3?

Answers (2)

Related Questions