pr0logas
pr0logas

Reputation: 693

MongoDB sharded cluster. How to distribute the disk storage between the nodes?

We successfully created a working MongoDB cluster with the following setup:

All separate instances are identical in terms of CPU, MEM, disk space, etc.

The cluster itself seems to work great. We are using these sharding commands in order to distribute the shards:

sh.shardCollection("blockchains.ethereum_balance",{"dapp_id":"hashed"})
sh.shardCollection("blockchains.ethereum_daily",{"to":"hashed"})

etc.

However, the storage distributions seem not equal or at least not efficient:

mongodb_shards_disk_usage

Questions:

  1. If we add a new shard, does the new node get some information from "older" nodes? (The "old" shards are moving partial data)
  2. How to manage the storage distribution in this case?

Any ideas appreciated.

EDIT:

It seems that newly created shards have just a portion of some collections. Automatic migration takes time.

In addition, indexes are kept only per one shard only. In order Mongo to move some chunks to newly created shards indexes should be in target shards too.

Upvotes: 0

Views: 219

Answers (1)

Wernfried Domscheit
Wernfried Domscheit

Reputation: 59456

In a sharded cluster you have the Sharded Cluster Balancer which takes care about distributing the data evenly. By default you don't have to do anything manually.

Check the sharding status with sh.status() and db.ethereum_balance.getShardDistribution() / db.ethereum_daily.getShardDistribution()

Perhaps you selected a poor Shard Key, see Choosing a Shard Key. Do you have a big unsharded collection in your database?

Upvotes: 1

Related Questions