Reputation: 8995
After I restarted my sharded cluster I noticed the balancer was not migrating any data anymore but the command sh.isBalancerRunning()
always returned true.
I tried to to run the command sh.stopBalancer()
and it stuck forever on:
sh.stopBalancer()
Waiting for active hosts...
Waiting for the balancer lock...
Checking on the config server locks here is the data:
configsvr> db.locks.find({_id: "balancer"})
{ "_id" : "balancer", "process" : "myserver.mongodb.com:27017:1452776409:1804289383",
"state" : 2, "ts" : ObjectId("56cb817f2c4edd1226d6ae07"), "when" : ISODate("2016-02-22T21:45:35.360Z"), "who" : "myserver.mongodb.com:27017:1452776409:1804289383:Balancer:846930886",
"why" : "doing balance round" }
Also, if I try to run sh.startBalancer()
it times out:
mongos> sh.startBalancer()
2016-02-23T22:51:11.204-0500 E QUERY [thread1] Error: assert.soon failed, msg:Waited too long for lock balancer to change to state undefined :
doassert@src/mongo/shell/assert.js:15:14
assert.soon@src/mongo/shell/assert.js:200:13
sh.waitForDLock@src/mongo/shell/utils_sh.js:171:1
sh.waitForBalancer@src/mongo/shell/utils_sh.js:264:9
sh.startBalancer@src/mongo/shell/utils_sh.js:146:5
@(shell):1:1
in the sh.status()
:
balancer:
Currently enabled: yes
Currently running: yes
Balancer lock taken at Mon Feb 22 2016 16:45:35 GMT-0500 (EST) by myserver.mongodb.com:27017:1452776409:1804289383:Balancer:846930886
Balancer active window is set between 8:00 and 6:00 server local time
Failed balancer rounds in last 5 attempts: 5
Last reported error: Connection refused
Time of Reported error: Tue Feb 23 2016 17:27:26 GMT-0500 (EST)
Migration Results for the last 24 hours:
No recent migrations
I have tried restarting the servers, stepping down primaries, changing the locks balancer state to 0 and running sh.startBalancer()
and removing the balancer field from the locks db and trying to run sh.startBalancer()
again with no results.
Upvotes: 3
Views: 2551
Reputation: 8995
At the end it was an issue with the server clocks been out of sync, for some reason the logs about this issue didn't appear until the next day.
Hope this helps someone with a similar issue :)
Upvotes: 2