razor
razor

Reputation: 2877

Amazon S3, storing large number of files (millions, and many TB of data)

I'll have to store millions of files (many TB in the future) in S3. Are there any limitations? (not a price :) ), i'm asking about architectural limitations (like - don't store it this way, the other way will be better/faster). My files are in a hierarchy

/{country}/{number}/{code}/docs

and i checked i can keep them that way (to access them easy thru REST) (of course i know S3 keeps them internally in other way - not important to me). So, are there any limitations/pitfalls ?

Upvotes: 6

Views: 12148

Answers (2)

Anatoly
Anatoly

Reputation: 15530

AWS S3 does definitely have limits to access 100req/sec in case of similar path prefix, see the official docs: http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html

From the other side a hierarchical approach makes logic complicated. A trade off depends on your requirements, one of good options can be using at least 4 symbols length key (primary id or hash key) in front of URL. In case of having limited number countries try using multiple buckets with country code as a bucket name, it also helps to define a specific physical location if required.

Upvotes: 5

greg_diesel
greg_diesel

Reputation: 3005

S3 has no limits that you would hit. The files are not really in folders, they are just strings as locations. Make the folder structure something that is easy for you to keep track of and organize.

You do NOT want to be listing the "folder" contents in S3 to find things. S3 is slow at giving directory listings, because it's not really directories.

You should be storing either the whole path /{country}/{number}/{code}/docs in a database or the logic should be so repeatable that you can be confident that the file will be in that location.

James Brady gave an excellent and very detailed answer to how s3 treats file storage in a question here https://stackoverflow.com/a/394505/4179009

Upvotes: 4

Related Questions