Reputation: 3497
I'm just like many others is thinking about correct approach to shard my collections in Mongo. Main question is - how does auto-sharding work?
The official doc says - "MongoDB scales horizontally via an auto-sharding (partitioning) architecture" and "To partition a collection, we specify a shard key pattern." with note "It is important to choose the right shard key for a collection" :).
http://www.mongodb.org/display/DOCS/Sharding+Introduction#ShardingIntroduction-ShardKeys
http://www.mongodb.org/display/DOCS/Choosing+a+Shard+Key
Now the question is - "is this right key"(sharding by ObjectID)?
db.runCommand({ shardcollection : "test", key : { _id : 1 }})
What happens internally in Mongo for ? How Mongo will split data to chunks in this case? Assuming i initially have 10mln of records with 2 shard servers - what happens on Mongo side when I'd like to add 2 more shard server when collection reaches 20mln records? I could not find that level of details anywhere on Mongo-related sources.
Taking into account random nature of autogenerated _id and it's structure,
... http://www.mongodb.org/display/DOCS/Object+IDs ...
i would shard by the least significant byte (rtl order) with chunks split by value of 2-3 bytes - this would provide easy way to shard by 2^N of shard servers - 2, 4, 8, .., 256 shard servers with more-or-less even load on each shard and with minimal required configuration. As far as i understand Mongo supports only sharding/chunking by explicitly defined ranges and that my idea will not work. Is is true?
Upvotes: 12
Views: 11242
Reputation: 224
A new exciting feature in version 2.4 is that Hashed index is supported, and can be used as Shard Keys. So the answer to your main question "Sharding by ObjectID, is it the right way?" may be yes now!
More references are in the official docs:
Hashed Shard Keys
http://docs.mongodb.org/manual/core/sharded-cluster-internals/#hashed-shard-keys
Hashed Index
http://docs.mongodb.org/manual/core/indexes/#hashed-index
Upvotes: 18
Reputation: 678
It's generally not a good idea to use the default object id as the shard key since it has an embedded timestamp and monotonically increases in time. This may work fine if you do a lot of updates such that it touches old and new documents in an evenly distributed fashion. However, this is really bad news if your application is heavy on inserts since majority of your writes will go to a single shard. This is because the writes will go to the shard that owns the [nearCurrentTimestamp -> infinity] chunk.
Each mongos monitors write traffic to shards and use a very simple heuristic to determine if a chunk has become too big and needs to be split (threshold size is configurable via chunkSize).
When you add a new shard to the cluster, the balancer (http://www.mongodb.org/display/DOCS/Sharding+Administration#ShardingAdministration-Balancing) will see a chunk imbalance and will start migrating chunks to the new shards.
Mongo supports range based sharding, however, that does not mean that the ranges are fixed since chunks can be split into smaller ranges and moved around the cluster over time.
Upvotes: 19