MauriRamone
MauriRamone

Reputation: 503

Why in my mongodb all the collections are in the same replica set?

I have a 2 nodes cluster with mongodb 3.2. I have made it only for testing purposes. The system has 2 replica sets.

Using python and the pymongo driver, I created a database ('test') and enabled the sharding. Then I created 10000 collections (To each of them I created a composite sharded key and enabled it to sharding). Then, to each collection, insert only one document.

The commands that I used (and worked correctly) are:

mongoClient = MongoClient('xx.xx.xx.xx:27017')
db = mongoClient.admin
db.command('enableSharding', 'test')
for i in range(0,10000):
    col = "test." + str(i)
    db.command({'shardCollection': col, 'key': {'ValueX': 1, 'ValueY': 1}})

db = mongoClient['test']

with open('doc.json') as json_data:
    post = json.load(json_data)

for i in range(0,10000):
    col = db[str(i)]
    col.insert(post)

My doubt: In mongo shell, I used db.stats() to get information about the 'test' database. I found that all collections were in the same replica set (I expected to find 5000 in each).

Surely I have a misconception of how collections are stored in a distributed system with mongodb, but I'm not realizing which.

enter image description here

I leave a reference image.

I hope someone can help me understand.

regards,

Upvotes: 0

Views: 569

Answers (1)

JJussi
JJussi

Reputation: 1580

Sharding collection means that collection is at all shards and chunks where documents are distributed evenly to all shards. So, if we have the one collection where we have 10.000 documents at 100 chunks, those chunks are distributed evenly. With two shards, both shards have 50 chunks and one chunk have "about" 100 documents.

Chunk is a range of documents wich sharding key value is at certain limits. So, if our sharding key type is integer, limits of one chunk could be lower:50, upper:75 and all those documents where that key is between those values (excluding upper limit) are stored at that specific chunk.

In this your case, collection has been created to "primary" shard and because every collection have only one chunk, all collections are at same shard and those cannot be moved (by auto balancer)

With command sh.status() you get sharding information.

Upvotes: 1

Related Questions