atimb
atimb

Reputation: 1092

Expected balancing behavior when restoring data with mongorestore to a sharded cluster

I have noticed that when restoring data with mongorestore to a sharded cluster through mongos, all the records are initially saved to the primary shard (of the collection) and only the balancer process moves these chunks, which is a relatively slow process, so right after restore I have a similar situation:

chunks:
    rs_shard-1  28
    rs_shard-2  29
    rs_shard-4  27
    rs_shard-3  644

I don't have any errors in the mongodb/mongos log files.

I'm not sure, but I think that in the past data was restored in an already balanced way. Now I'm using version 2.4.6. Can someone confirm what is the expected behavior?

Upvotes: 2

Views: 976

Answers (1)

Markus W Mahlberg
Markus W Mahlberg

Reputation: 20703

Here is what happens imho:

When restoring the data, there are initial ranges for chunks assigned to each shard. The data is inserted by mongorestore without waiting for any responses from mongos, not speaking of the shards, resulting in a relatively fast insertion of the documents. I assume that you have a monotonically increasing shard key, like ObjectId for example. Now what happens is that one shard has been assigned the range from X to infinite (called "maxKey" in mongoland) during the initial assignment of chunk ranges. The documents in this range will be created on that shard, resulting in a lot of chunk splits and an increasing number of chunks on that server. A chunk split will trigger a balancer round, but since the insertion of new documents is faster than the chunk migration, the number of chunks will increase faster than the balancer can reduce it.

So what I would do is to check the shard key. I am pretty sure that it is monotonically increasing. Which is bad not only when restoring a backup, but in production use, too. Please see shard key documentation and Considerations for Selecting Shard Keys in the MongoDB docs.

A few additional notes. The mongodump utility is designed for small databases, like the config db of a sharded cluster. Your database has a size of roughly 46.5GB which isn't exactly small. I'd rather use file system snapshots on each individual shard, synchronized using a cronjob. If you really need a point in time recovery, you can still use mongodump in direct file access mode on the snapshotted files to create a dump and restore those dumps using the --oplogLimit option. Other than the ability to do a point in time recovery, the usage of mongodump has no advantage over taking file system snapshots, but has the disadvantage that you have to stop the balancer in order to have a consistent backup and to lock the database during the whole backup procedure in order to have a true point in time recovery option.

Upvotes: 1

Related Questions