Archanaa Panda
Archanaa Panda

Reputation: 201

MongoDB : Application generated _id containing country and sequence as shard key?

I have some very basic questions regarding shard key.
In our application, we are creating _id fields from a coarse grained attribute -country and a per country monotonically increasing sequence number eg IN_1. As per several online references and books (eg http://www.kchodorow.com/blog/2011/01/04/how-to-choose-a-shard-key-the-card-game/), it is better to have a compound shard key - coarse grained key + search key. We also have a country attribute in the shard key and have an index on that.

Most, if not all of our queries on these collections are going to be country search based or _id based.

What would be the best choice for a shard key

  1. just _id - since ctry is already baked into it - will it make queries starting with ctry slow?
  2. {ctry: 1, _id: 1} - but part of my _id is monotonically increasing sequence.
  3. {ctry: 1, _id: hashed} - Promises both read locality and write distribution. Is this supported by MongoDB?
  4. just {_id: hashed} - will it make queries starting with ctry slow?

I am inclined to using the option 3- but is that possible - it is not very clear from MongoDB docs.

Upvotes: 1

Views: 145

Answers (1)

wdberkeley
wdberkeley

Reputation: 11671

just _id - since ctry is already baked into it - will it make queries starting with ctry slow?

I don't recommend this. _id is monotonically increasing so it will concentrate writes on a single shard. If you are also doing queries with the shape

{ "ctry" : "United States" }

then they will be broadcast to all shards, not targeted.

{ctry: 1, _id: 1} - but part of my _id is monotonically increasing sequence.

_id is monotonically increasing but, as long as you are inserting documents with shard key values containing different values for ctry, the shard key values are not monotonically increasing, so you will not be concentrating writes on a single shard. However, for a given country c, all writes will go to only one of the shards containing chunks with ctry = c.

This seems reasonable.

{ctry: 1, _id: hashed} - Promises both read locality and write distribution. Is this supported by MongoDB?

I like this best. It supports your queries, provides read isolation, and spreads out writes. Hashed shard keys in MongoDB are built on a single field, but you can compute your own hash hashed_id and store that on the document, then shard on

{ "ctry" : 1, "hashed_id" : 1 }

just {_id: hashed} - will it make queries starting with ctry slow?

The monotonicity problem is gone but ctry queries will still be scatter-gather.

Upvotes: 1

Related Questions