Reputation: 5554
Based on this resource adding a pseudo-random prefix to an S3 key will increase your GET performance over having a constant prefix.
So a key of the form:
bucket/$randomPrefix-key.txt
Will perform better in GETs than
bucket/$date-key.txt
It also implies that the common prefix portion doesn't matter. From the article:
You can optionally add more prefixes in your key name, before the hash string, to group objects. The following example adds animations/ and videos/ prefixes to the key names.
examplebucket/animations/232a-2013-26-05-15-00-00/cust1234234/animation1.obj examplebucket/animations/7b54-2013-26-05-15-00-00/cust3857422/animation2.obj examplebucket/animations/921c-2013-26-05-15-00-00/cust1248473/animation3.obj examplebucket/videos/ba65-2013-26-05-15-00-00/cust8474937/video2.mpg examplebucket/videos/8761-2013-26-05-15-00-00/cust1248473/video3.mpg examplebucket/videos/2e4f-2013-26-05-15-00-01/cust1248473/video4.mpg examplebucket/videos/9810-2013-26-05-15-00-01/cust1248473/video5.mpg examplebucket/videos/7e34-2013-26-05-15-00-01/cust1248473/video6.mpg examplebucket/videos/c34a-2013-26-05-15-00-01/cust1248473/video7.mpg ...
So a key of the form
bucket/foo/bar/baz/$randomPrefix-key.txt
Will apparently work just as well as (1).
My question: what if the pseudorandom prefix is in the middle of the key? Does that work just as well?
For example:
bucket/foo/bar/baz-$pseudoRandomString-key.txt
Upvotes: 1
Views: 87
Reputation: 179124
Your example is no different than the ones in the documentation, for an important reason: slashes /
have no intrinsic meaning to S3.
There are no folders in S3. foo/bar.txt
and foo/baz.jpg
are not "in the same folder."
Technically, they are just two objects whose keys have a common prefix.
The console displays them in a folder, only for organizational convenience.
Amazon S3 has a flat structure with no hierarchy like you would see in a typical file system. However, for the sake of organizational simplicity, the Amazon S3 console supports the folder concept as a means of grouping objects.
http://docs.aws.amazon.com/AmazonS3/latest/UG/FolderOperations.html
Also:
The Amazon S3 data model does not natively support the concept of folders, nor does it provide any APIs for folder-level operations. But the Amazon S3 console supports folders to help you organize your data.
http://docs.aws.amazon.com/AmazonS3/latest/UG/about-using-console.html
Thus the /
has no special meaning to the S3 index, and no special meaning relative to the placement of your random prefix.
However, it's important that the characters before the random prefix remain the same, so that partition splits can be accomplished right at the beginning of the random characters.
S3 must be able to split the list of keys beginning with the first random character and find a balance of work to the left of (<
) and right of (>=
) the split point.
If you have this...
fix/ed/chars/here-then-$random/anything/here
...then S3 says to itself "hmm... it looks like example-bucket/fixed/chars/here-then-*
seems to be taking a lot of traffic, but it looks like the next character is always one of 0 1 2 3 4 5 6 7 8 9 a b c d e f and they're pretty well balanced, so I'm going to split it at "8," so that ...then-0*
through ...then-7*
is in one partition and ...then-8
through ...then-f
in another" and #boom, potential performance bottleneck solved.
The partitioning is completely automatic and transparent.
Here's an example of what not to do.
logs/2017-01-23/$random/...
logs/2017-01-24/$random/...
logs/2017-01-25/$random/...
Here, a hot spot develops in a different prefix each day, giving S3 no good options for creating effective partition splits to alleviate any overload. Any split would always end up to the left of (lexically less than) all future uploads, at some point, in this case -- so not an effective split. By contrast, the split, above, puts about half the workload <
and the other half >=
a split at a single character.
Also worth noting ... if you don't expect a sustained workload > 100 req/sec, at least, this isn't going to give you any benefit at all. Natural randomness in your keyspace may also suffice, and S3 reads can scale essentially indefinately without these optimizations when coupled with CloudFront (and usually faster and often slightly cheaper, since CloudFront bandwidth pricing is slightly lower than S3 in some areas, presumably since it relieves potential Internet congestion from the Internet connections at the S3 regions). When S3 is connected to CloudFront, S3 rates its bandwidth charges at $0.00/GB Out to the Internet, and CloudFront bills that piece, at its rates, instead of S3.
Upvotes: 2