File Service Architecture & Cost Analysis

Context

I am developping a webapp that

Takes an URL from the user
Downloads and stores the associated file onto my server
The user can fetch the file from my server at any time before the file is eventually expired and removed

I am planning to deploy this application on the AWS. More specifically, using EC2 and S3.

Challenge

I am trying to come up with a design that is both cost-effective and performant to offer this service.

Analysis

The following assumptions are used:

the downloaded file will be available to only one user, the one who provided the URL and initiated the download
the user will only fetch the file once from the server
the file will only stay on the server for at maximum 24 hours before getting removed
the file sizes are in the 100MB - 5GB range

Consider the following application flow:

Internet → EC2: Download the file onto local storage
EC2 → S3: Upload the downloaded file onto S3, deletes the local copy on EC2
EC2 → User: Provide the user with a direct URL to fetch from S3
S3 → user: The user fetches the file from S3
S3: file is removed after 24 hours.

In terms of network performance, step 1 and 2 will be the bottlenecks as EC2 has limited downloading and uploading bandwidth. Step 4 should not be a problem since S3 is taking care of the bandwidth for transferring file to the end user.

In terms of costs, fixed costs are the EC2 instances, and the main variable cost is step 4, where AWS charges 0.09$/GB in data transfer. Since the files are removed after 24 hours, the storage fee is comparatively tiny.

Question

Have I correctly identified the performance bottlenecks in this application flow?
Is my cost analysis correct?
Is this the optimal flow in terms of costs? Is there any way to further reduce the cost?
Since step 1 and step 2 (downloading from Internet and uploading to S3) will be very bandwidth-consuming when simultaneously downloading multiple large files, will it significantly affect the responsiveness of my server to serve regular API requests? Should I use a dedicated EC2 instance just for handling API calls from the clients, and another dedicated EC2 instance just for downloading and uploading? This will slightly further complicate the design, as I will have to manage the communication between the 2 instances as well.

Upvotes: 3

Answers (2)

Tim Rodger

Reputation: 11

How about allowing the client to upload files directly to S3?

Your application would generate a pre-signed url, so that you can control which users can upload files, but after that the client interacts directly with S3. This would remove the costly "download then upload" process in steps 1 & 2.

See this document http://docs.aws.amazon.com/AmazonS3/latest/dev/PresignedUrlUploadObject.html

Upvotes: 1

some ideas

Reputation: 74

Can you use more AWS Services? Are you aware of AWS Lambda? https://aws.amazon.com/lambda/details/ It can perform actions in response to actions, e.g. it could delete a file from S3 shortly after it is downloaded. http://docs.aws.amazon.com/lambda/latest/dg/with-s3-example.html This alleviates the need to track downloads and delete them, once you get past the learning curve of AWS Lambda. It can also handle other processing, so you only have to upload to S3 from EC2.

Regarding cost, S3 has different quality levels, and the "reduced redundancy" might be sufficient for your needs, saving a little money.

Upvotes: 2

File Service Architecture &amp; Cost Analysis

Answers (2)

Related Questions

File Service Architecture & Cost Analysis