knightcool
knightcool

Reputation: 348

Need for metadata store while storing an object

While checking out the design of a service like pastebin, I noticed the usage of two different storage systems:

  1. An object store(such as Amazon S3) for storing the actual "paste" data
  2. A metadata store to store other things pertaining to that "paste" data; such as - URL Hash(to access that paste data), Reference to the actual paste data etc.

I am trying to understand the need for this metadata store.

Is this generally the recommended way? Any specific advantage we get from using the metadata store?

Do object storage systems NOT allow metadata to be stored along with the actual object in the same storage server?

Upvotes: 0

Views: 550

Answers (2)

Ujjwal Vaish
Ujjwal Vaish

Reputation: 375

Great answer above, just to add on - two more advantages are caching and scaling up both storage systems individually.

  1. If you just use an object storage, and say a paste is 5 MB, would you cache all of it? Metadata storage also allows to improve UX by caching say first 10 or 100 KB of data for a paste for the user to preview, meanwhile the complete object is fetched in the background. This upper bound also helps to design cache deterministically.
  2. You can also scale the object store and the metadata store independently of each other as per performance/ capacity needs. Lookups in the metadata store will also be quicker since it's less bulkier.

Your concern is legitimate that separating the storage into 2 tables (or mediums) does add some latency, but it's always a compromise with System Design, there is hardly a Win-Win situation.

Upvotes: 1

root
root

Reputation: 6048

Object storage systems generally do allow quite a lot of metadata to be attached to the object.

But then your metadata is at the mercy of the object store.

  • Your metadata search is limited to what the object store allows.
  • Analysis, notification (a-la inotify) etc. are at limited to what the object store allows.
  • If you wanted to move from S3 to Google Cloud Storage, or to do both, you'd have to normalize your metadata.
  • Your metadata size limitations are limited to that of the object store.
  • You can't do cross-object-store metadata (e.g. a link that refers to multiple paste data).
  • You might not be able to have binary metdata.

Typically, metadata is both very important, and very heavily used by the business, so it has separate usage characteristics than the data, so it makes sense to put it on storage with different characteristics.

I can't find anywhere how pastebin.com makes money, so I don't know how heavily they use metadata, but merely the lookup, the translation between URL and paste data, is not something you can do securely with object storage alone.

Upvotes: 2

Related Questions