vegasje
vegasje

Reputation: 238

choosing a mongodb sharding key

I am launching a data storage service for my application. MongoDB is running as the storage mechanism, and I have created 2 shards to start.

The application will be storing event data, and all data will be structured as follows:

{ 
  _id: '4fa2f7e25626cd1374000002', 
  created_at: '2012-05-03T21:25:54 00:00', 
  name: 'client_session_connect', 
  session_remote_id: '74ACF9AA-9E09-11E1-8C9E-8462380DA5E6', 
  zone_id: '74ACF9AA-9E09-11E1-8C9E-1231380DA5E6',
  additional: {
    some_other_key: 'value'
  }
} 

Events will have a variety of names, and any new event can be created at any time with a new event name. There will be plenty of events in the system with the same name. _id, created_at, and name will be part of every event, but no other values are guaranteed.

Based on what I have read (here, and here), it seems that the best sharding key would be { name: 1, created_at: 1 }. Would I be correct in this interpretation?

Upvotes: 1

Views: 1223

Answers (1)

matulef
matulef

Reputation: 3266

From what you've stated, it seems like that would be a good shard key, with a few caveats:

-shard keys are immutable, so if you ever need to change the "name" field of a document, you'll need to delete and reinsert it (probably this isn't an issue for you, unless you intend to change names often).

-If you write a lot of documents with the same "name" in quick succession, all these writes will go to the same chunk, since "created_at" is presumably an increasing field. Eventually the chunk will be split into multiple chunks and balanced off the receiving machine, so this is only a problem if you expect to receive a huge volume of writes of docs with same "name."

-If the "name"s are not uniformly distributed, you could hash the name and store the result in a new field of your document, then make the shard key {hashedName : 1, created_at : 1}. This might give a more even load distribution, reducing the amount of balancing later. It does add a little complexity to your documents, though.

Assuming you're aware of these things, {name: 1, created_at: 1} may very well be the best shard key for you.

Upvotes: 4

Related Questions