Reputation: 312
When I stop nodes of my replica set and start them up again, the primary node goes into status "recovering".
I have a replica set created, running without authorization. In order to use authorization I have added users "db.createUser(...)", and enabled authorization in the configuration file:
security:
authorization: "enabled"
Before stopping replica set (even restarting cluster without adding security params), rs.status() shows:
{
"set" : "REPLICASET",
"date" : ISODate("2016-09-08T09:57:50.335Z"),
"myState" : 1,
"term" : NumberLong(7),
"heartbeatIntervalMillis" : NumberLong(2000),
"members" : [
{
"_id" : 0,
"name" : "192.168.1.167:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 301,
"optime" : {
"ts" : Timestamp(1473328390, 2),
"t" : NumberLong(7)
},
"optimeDate" : ISODate("2016-09-08T09:53:10Z"),
"electionTime" : Timestamp(1473328390, 1),
"electionDate" : ISODate("2016-09-08T09:53:10Z"),
"configVersion" : 1,
"self" : true
},
{
"_id" : 1,
"name" : "192.168.1.168:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 295,
"optime" : {
"ts" : Timestamp(1473328390, 2),
"t" : NumberLong(7)
},
"optimeDate" : ISODate("2016-09-08T09:53:10Z"),
"lastHeartbeat" : ISODate("2016-09-08T09:57:48.679Z"),
"lastHeartbeatRecv" : ISODate("2016-09-08T09:57:49.676Z"),
"pingMs" : NumberLong(0),
"syncingTo" : "192.168.1.167:27017",
"configVersion" : 1
},
{
"_id" : 2,
"name" : "192.168.1.169:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 295,
"optime" : {
"ts" : Timestamp(1473328390, 2),
"t" : NumberLong(7)
},
"optimeDate" : ISODate("2016-09-08T09:53:10Z"),
"lastHeartbeat" : ISODate("2016-09-08T09:57:48.680Z"),
"lastHeartbeatRecv" : ISODate("2016-09-08T09:57:49.054Z"),
"pingMs" : NumberLong(0),
"syncingTo" : "192.168.1.168:27017",
"configVersion" : 1
}
],
"ok" : 1
}
In order to start using this configuration, I have stopped each node as follows:
[root@n--- etc]# mongo --port 27017 --eval 'db.adminCommand("shutdown")'
MongoDB shell version: 3.2.9
connecting to: 127.0.0.1:27017/test
2016-09-02T14:26:15.784+0200 W NETWORK [thread1] Failed to connect to 127.0.0.1:27017, reason: errno:111 Connection refused
2016-09-02T14:26:15.785+0200 E QUERY [thread1] Error: couldn't connect to server 127.0.0.1:27017, connection attempt failed :
connect@src/mongo/shell/mongo.js:231:14
After this shutdown, I have confirmed that the process does not exist by checking the output from ps -ax | grep mongo
.
But when I start the nodes again and log in with my credentials, rs.status() indicates now:
{
"set" : "REPLICASET",
"date" : ISODate("2016-09-08T13:19:12.963Z"),
"myState" : 3,
"term" : NumberLong(7),
"heartbeatIntervalMillis" : NumberLong(2000),
"members" : [
{
"_id" : 0,
"name" : "192.168.1.167:27017",
"health" : 1,
"state" : 3,
"stateStr" : "RECOVERING",
"uptime" : 42,
"optime" : {
"ts" : Timestamp(1473340490, 6),
"t" : NumberLong(7)
},
"optimeDate" : ISODate("2016-09-08T13:14:50Z"),
"infoMessage" : "could not find member to sync from",
"configVersion" : 1,
"self" : true
},
{
"_id" : 1,
"name" : "192.168.1.168:27017",
"health" : 0,
"state" : 6,
"stateStr" : "(not reachable/healthy)",
"uptime" : 0,
"optime" : {
"ts" : Timestamp(0, 0),
"t" : NumberLong(-1)
},
"optimeDate" : ISODate("1970-01-01T00:00:00Z"),
"lastHeartbeat" : ISODate("2016-09-08T13:19:10.553Z"),
"lastHeartbeatRecv" : ISODate("1970-01-01T00:00:00Z"),
"pingMs" : NumberLong(0),
"authenticated" : false,
"configVersion" : -1
},
{
"_id" : 2,
"name" : "192.168.1.169:27017",
"health" : 0,
"state" : 6,
"stateStr" : "(not reachable/healthy)",
"uptime" : 0,
"optime" : {
"ts" : Timestamp(0, 0),
"t" : NumberLong(-1)
},
"optimeDate" : ISODate("1970-01-01T00:00:00Z"),
"lastHeartbeat" : ISODate("2016-09-08T13:19:10.552Z"),
"lastHeartbeatRecv" : ISODate("1970-01-01T00:00:00Z"),
"pingMs" : NumberLong(0),
"authenticated" : false,
"configVersion" : -1
}
],
"ok" : 1
}
Why? Perhaps the shutdown is not a good way to stop mongod; however I also tested using 'kill pid', but the restart ends up in the same state.
In this status I don´t know how to repair the cluster; I have started again (removing the dbpath files and reconfiguring the replica set); I tried '--repair' but has not worked.
Info about my system:
mongod --conf /etc/mongod.conf
An example of the "net.bindIp" configuration, from mongod.conf on one machine:
net:
port: 27017
bindIp: 127.0.0.1,192.168.1.167
Upvotes: 7
Views: 3263
Reputation: 312
finally I resolved the problem, for a cluster replica set is MANDATORY a keyFile to communicate all nodes, when I indicated keyFile it returns error because in mongod.log indicated :
I ACCESS [main] permissions on /etc/keyfile are too open
keyfile must have 400 as permission. Thanks @Saleem
When people says "You can add keyfile" I was thinking as an optional param but it is mandatory.
Upvotes: 2
Reputation: 9
Node should be shut down one at a time, so other secondry member will elect for primary. And it will be in recovery node while syncing to the to the other member. This one by one shutdown will not require to re add the nodes.
Upvotes: 0
Reputation: 9008
Note: This solution is Windows specific but can be ported to *nix based systems easily.
You'll need to take steps in sequence. First of all, start your mongod instances.
start "29001" mongod --dbpath "C:\data\db\r1" --port 29001
start "29002" mongod --dbpath "C:\data\db\r2" --port 29002
start "29003" mongod --dbpath "C:\data\db\r3" --port 29003
Connect with mongo to each node and create an administrator user. I prefer creating super user.
> use admin
> db.createUser({user: "root", pwd: "123456", roles:["root"]})
You may create other users as deemed necessary.
Create key file. See documentation for valid key file contents.
Note: On *nix based systems, set chmod of key file to 400
In my case, I created key file as
echo mysecret==key > C:\data\key\key.txt
Now restart your MongoDB servers with --keyFile
and --replSet
flags enabled.
start "29001" mongod --dbpath "C:\data\db\r1" --port 29001 --replSet "rs1" --keyFile C:\data\key\key.txt
start "29002" mongod --dbpath "C:\data\db\r2" --port 29002 --replSet "rs1" --keyFile C:\data\key\key.txt
start "29003" mongod --dbpath "C:\data\db\r3" --port 29003 --replSet "rs1" --keyFile C:\data\key\key.txt
Once all mongod
instances are up and running, connect any one with authentication.
mongo --port 29001 -u "root" -p "123456" --authenticationDatabase "admin"
Initiate replicaset,
> use admin
> rs.initiate()
> rs1:PRIMARY> rs.add("localhost:29002")
{ "ok" : 1 }
> rs1:PRIMARY> rs.add("localhost:29003")
{ "ok" : 1 }
Note: You may need to replace
localhost
with machine name or IP address.
Upvotes: 1