pyhotshot
pyhotshot

Reputation: 515

s3 bucket date path format for faster operations

I was told by one of the consultants from AWS itself that, while naming the folders(objects) in s3 with date. use MM-DD-YYYY for faster s3 operations like get Object, but i usually use YYYY-MM-DD. I don't understand what difference it makes, is there a difference, if yes, which one is better?

Upvotes: 3

Views: 2228

Answers (2)

Dennis Traub
Dennis Traub

Reputation: 51654

This used to be a limitation due to the way data had been stored in the back end, but it doesn't apply (to the original extend, see jellycsc's comment below) anymore.

The reason for this recommendation was, that in the past Amazon Simple Storage Service (S3) partitioned data using the key. With many files having the same prefix (like e.g. all starting with the same year) this could have led to reduced performance when many files needed to be loaded from the same partition.

However, since 2018, hashing and random prefixing the S3 key is no longer required to see improved performance: https://aws.amazon.com/about-aws/whats-new/2018/07/amazon-s3-announces-increased-request-rate-performance/

Upvotes: 3

jellycsc
jellycsc

Reputation: 12359

S3 creates so-called partitions under the hood in order to serve up your requests to the bucket. Each partition has the ability to serve 3,500 PUT/COPY/POST/DELETE or 5,500 GET/HEAD requests per second. They partition the bucket based on the common prefix among all the object keys. MM-DD-YYYY date format would be slightly faster than YYYY-MM-DD because objects with MM-DD-YYYY naming will spread across more partitions.

Key take away here: more randomness at the beginning of the object keys will likely give you more performance out of the S3 bucket

Upvotes: 3

Related Questions