Reputation: 1190
I have a mongo 2.4.8 cluster. My software dynamically partitions data, and I now have about 30,000 sharded collections. The cluster currently contains only one shard (which is a replica set); it is a cluster to allow easy future expansion.
When I start a new mongos
process and run show collections
, it takes it several hours to complete. During this time the mongos
is unresponsive to all clients (but the cluster is fine). If I never run show collectoins
, all other operations through the mongos
work normally.
Eventually show collections
completes and after that the mongos
works fine, and running show collections
again on the same mongos
returns right away. I only found out there was a problem when I needed to restart a mongos
for the first time in many months, during which the collection count rose greatly.
Logically, it would seem that data transfer (about collection chunks) from the config servers to the new mongos
is the bottleneck. But neither side shows high CPU or network activity while this is going on.
Is this known behavior? How can I further investigate the problem?
Upvotes: 3
Views: 124
Reputation: 1190
I traced the problem to a faulty config server. After replacing it, everything is working fine again.
Details: the bad server didn't respond to queries, after which they were re-sent to other servers. This created an effective latency for each request to the config servers, which was most pronounced in the 'show collections' operation that does at least one roundtrip per collection between mongos and the config servers, and does them all serially.
Upvotes: 1