Sharding by ObjectID, is it the right way?

Question

I'm just like many others is thinking about correct approach to shard my collections in Mongo. Main question is - how does auto-sharding work?

The official doc says - "MongoDB scales horizontally via an auto-sharding (partitioning) architecture" and "To partition a collection, we specify a shard key pattern." with note "It is important to choose the right shard key for a collection" :).
http://www.mongodb.org/display/DOCS/Sharding+Introduction#ShardingIntroduction-ShardKeys
http://www.mongodb.org/display/DOCS/Choosing+a+Shard+Key

Now the question is - "is this right key"(sharding by ObjectID)?

db.runCommand({ shardcollection : "test", key : { _id : 1 }})

What happens internally in Mongo for ? How Mongo will split data to chunks in this case? Assuming i initially have 10mln of records with 2 shard servers - what happens on Mongo side when I'd like to add 2 more shard server when collection reaches 20mln records? I could not find that level of details anywhere on Mongo-related sources.

Taking into account random nature of autogenerated _id and it's structure,

... http://www.mongodb.org/display/DOCS/Object+IDs ...

i would shard by the least significant byte (rtl order) with chunks split by value of 2-3 bytes - this would provide easy way to shard by 2^N of shard servers - 2, 4, 8, .., 256 shard servers with more-or-less even load on each shard and with minimal required configuration. As far as i understand Mongo supports only sharding/chunking by explicitly defined ranges and that my idea will not work. Is is true?

treemanz · Accepted Answer

A new exciting feature in version 2.4 is that Hashed index is supported, and can be used as Shard Keys. So the answer to your main question "Sharding by ObjectID, is it the right way?" may be yes now!

More references are in the official docs:

Hashed Shard Keys

http://docs.mongodb.org/manual/core/sharded-cluster-internals/#hashed-shard-keys

Hashed Index

http://docs.mongodb.org/manual/core/indexes/#hashed-index

Sharding by ObjectID, is it the right way?

Answers (2)

Related Questions