Sash
Sash

Reputation: 4598

Controlled data sharding in MongoDB

I am new to MongoDB and I have very basic knowledge of its concepts of sharding. However I was wondering if it is possible to control the split of data yourself? For example a part of the records would be stored on one specific shard? This will be used together with a rails app.

Upvotes: 3

Views: 888

Answers (4)

Aafreen Sheikh
Aafreen Sheikh

Reputation: 5064

Probably http://www.mongodb.org/display/DOCS/Tag+Aware+Sharding addresses your requirement in v2.2
Check out Kristina Chodorow's blog post too for a nice example : http://www.kchodorow.com/blog/2012/07/25/controlling-collection-distribution/

Upvotes: 0

Aravind.HU
Aravind.HU

Reputation: 9472

Why do you want to split data yourself if mongo DB is automatically doing it for you , You can upgrade your rails application layer to talk to mongos instance so that mongos routes the call for any CRUD operation to the place where the data resides . This is achieved using config server .

Upvotes: -2

Mark Hillick
Mark Hillick

Reputation: 6973

The decision to shard is a complex decision and one that you should put a lot of thought into.

There's a lot to learn about sharding, and much of it is non-obvious. I'd suggest reviewing the information at the following links:

In the context of a shard cluster, a chunk is a contiguous range of shard key values assigned to a particular shard. By default, chunks are 64 megabytes (unless modified as per above). When they grow beyond the configured chunk size, a mongos splits the chunk into two chunks. MongoDB chunks are logical and the data within them is NOT physically located together.

As I've mentioned the balancer moves the chunks around, however, you can do this manually. The balancer will take the decision to re-balance and request a chunk migration if there is a large enough difference ( minumum of 8) between the number of chunks on each shard. The actual moving of the chunks is co-ordinated between the "From" and "To" shard and when this is finished, the original chunks are removed from the "From" shard and the config servers are informed.

Quite a lot of people also pre-split, which helps with their migration. See here for more information.

In order to see documents split among the two shards, you'll need to insert enough documents in order to fill up several chunks on the first shard. If you haven't changed the default chunk size, you'd need to insert a minimum of 512MB of data in order to see data migrated to a second chunk. It's often a good idea to to test this and you can do this by setting your chunk size to 1MB and inserting 10MB of data. Here is an example of how to test this.

Upvotes: 3

Ross
Ross

Reputation: 18101

You can turn off the balancer to stop auto balancing:

sh.setBalancerState(false)

If you know the range of the key you are splitting on you could also presplit your data ranges to the desired servers see PreSplitting example. The management of the shard would be done via the javascript shell and not via your rails application.

You should take care that no shard gets more load (becomes hot) and that is why there is auto balancing by default, using monitoring like the free MMS service will help you monitor that.

Upvotes: 5

Related Questions