Reputation: 2674
I'm working on a image hosting web site, and I'm in a bit of a pickle. I chose Amazon S3 beacuse it's fast, it scales and the pay-as-you-go model.
When I upload images from my web site, I need to process them on the server - creating 3 different sizes, inserting data into the DB, and then I'd be ready to upload to S3 (all 3 sizes of the image).
I'm currently using plupload for the uploading, and I set it all up now working with my DB as storage for the images - just for testing purposes. But I just realized - this uploading to my web server, processing and THEN uploading to S3 would mean double the upload time per image right?
Is there any smart way of handling this scenario?
Upvotes: 2
Views: 970
Reputation: 1609
Looking at your question again, it seems you're concerned about "upload time per image" -- are you referring to the end user waiting for your web app? You don't have to wait until the images are resized and uploaded to S3 to return a response to the uploading user. Once the user's upload is complete, you should queue a job and return a response to the user immediately. Then have a separate background thread that takes jobs off the queue and does image processing in the background. When a job finishes, have it upload to S3 and update the database to reflect that the resized images are there.
Upvotes: 1
Reputation: 1609
We are doing something similar with processing of files and storage in S3. The main difference is that our webservers and processing servers are on EC2, so they don't incur any transfer cost to go to/from S3, and they have very high bandwidth to S3. Is it possible to run your image resizing process on an EC2 instance? You can either:
Accept uploads directly to a server process on the EC2 instance, process them immediately, and then save the images to S3, or...
Upload images directly to S3, somehow signal your EC2 process of the images arrival (SQS queue perhaps), and then have your server process respond to the signal by grabbing the files from S3, processing the images, and saving the resized images back to S3.
Basically, I'm saying you should take advantage of this (from S3 description on AWS site):
There is no Data Transfer charge for data transferred between Amazon EC2 and Amazon S3 within the same Region or for data transferred between the Amazon EC2 Northern Virginia Region and the Amazon S3 US Standard Region.
Upvotes: 2
Reputation: 5804
Of course you'll need more time saving images because you'll actually have two HTTP transfers. You can try to execute uploads in parallel with other operations, for example:
I suspect that uploading the original/largest image takes more than generating the two smaller ones, so a parallel solution would work very well. Even if uploading takes less than generating the two other images, the upload is not stealing much CPU time, so in all cases you should see improvements.
Of course it's more complex, especially if you think about error handling.
Upvotes: 1