Reputation: 312

Stop replica set on mongo and primary goes into recovery status

When I stop nodes of my replica set and start them up again, the primary node goes into status "recovering".

I have a replica set created, running without authorization. In order to use authorization I have added users "db.createUser(...)", and enabled authorization in the configuration file:

security:
   authorization: "enabled"

Before stopping replica set (even restarting cluster without adding security params), rs.status() shows:

{
        "set" : "REPLICASET",
        "date" : ISODate("2016-09-08T09:57:50.335Z"),
        "myState" : 1,
        "term" : NumberLong(7),
        "heartbeatIntervalMillis" : NumberLong(2000),
        "members" : [
                {
                        "_id" : 0,
                        "name" : "192.168.1.167:27017",
                        "health" : 1,
                        "state" : 1,
                        "stateStr" : "PRIMARY",
                        "uptime" : 301,
                        "optime" : {
                                "ts" : Timestamp(1473328390, 2),
                                "t" : NumberLong(7)
                        },
                        "optimeDate" : ISODate("2016-09-08T09:53:10Z"),
                        "electionTime" : Timestamp(1473328390, 1),
                        "electionDate" : ISODate("2016-09-08T09:53:10Z"),
                        "configVersion" : 1,
                        "self" : true
                },
                {
                        "_id" : 1,
                        "name" : "192.168.1.168:27017",
                        "health" : 1,
                        "state" : 2,
                        "stateStr" : "SECONDARY",
                        "uptime" : 295,
                        "optime" : {
                                "ts" : Timestamp(1473328390, 2),
                                "t" : NumberLong(7)
                        },
                        "optimeDate" : ISODate("2016-09-08T09:53:10Z"),
                        "lastHeartbeat" : ISODate("2016-09-08T09:57:48.679Z"),
                        "lastHeartbeatRecv" : ISODate("2016-09-08T09:57:49.676Z"),
                        "pingMs" : NumberLong(0),
                        "syncingTo" : "192.168.1.167:27017",
                        "configVersion" : 1
                },
                {
                        "_id" : 2,
                        "name" : "192.168.1.169:27017",
                        "health" : 1,
                        "state" : 2,
                        "stateStr" : "SECONDARY",
                        "uptime" : 295,
                        "optime" : {
                                "ts" : Timestamp(1473328390, 2),
                                "t" : NumberLong(7)
                        },
                        "optimeDate" : ISODate("2016-09-08T09:53:10Z"),
                        "lastHeartbeat" : ISODate("2016-09-08T09:57:48.680Z"),
                        "lastHeartbeatRecv" : ISODate("2016-09-08T09:57:49.054Z"),
                        "pingMs" : NumberLong(0),
                        "syncingTo" : "192.168.1.168:27017",
                        "configVersion" : 1
                }
        ],
        "ok" : 1
}

In order to start using this configuration, I have stopped each node as follows:

[root@n--- etc]# mongo --port 27017 --eval 'db.adminCommand("shutdown")'
MongoDB shell version: 3.2.9
connecting to: 127.0.0.1:27017/test
2016-09-02T14:26:15.784+0200 W NETWORK  [thread1] Failed to connect to 127.0.0.1:27017, reason: errno:111 Connection refused
2016-09-02T14:26:15.785+0200 E QUERY    [thread1] Error: couldn't connect to server 127.0.0.1:27017, connection attempt failed :
connect@src/mongo/shell/mongo.js:231:14

After this shutdown, I have confirmed that the process does not exist by checking the output from ps -ax | grep mongo.

But when I start the nodes again and log in with my credentials, rs.status() indicates now:

{
        "set" : "REPLICASET",
        "date" : ISODate("2016-09-08T13:19:12.963Z"),
        "myState" : 3,
        "term" : NumberLong(7),
        "heartbeatIntervalMillis" : NumberLong(2000),
        "members" : [
                {
                        "_id" : 0,
                        "name" : "192.168.1.167:27017",
                        "health" : 1,
                        "state" : 3,
                        "stateStr" : "RECOVERING",
                        "uptime" : 42,
                        "optime" : {
                                "ts" : Timestamp(1473340490, 6),
                                "t" : NumberLong(7)
                        },
                        "optimeDate" : ISODate("2016-09-08T13:14:50Z"),
                        "infoMessage" : "could not find member to sync from",
                        "configVersion" : 1,
                        "self" : true
                },
                {
                        "_id" : 1,
                        "name" : "192.168.1.168:27017",
                        "health" : 0,
                        "state" : 6,
                        "stateStr" : "(not reachable/healthy)",
                        "uptime" : 0,
                        "optime" : {
                                "ts" : Timestamp(0, 0),
                                "t" : NumberLong(-1)
                        },
                        "optimeDate" : ISODate("1970-01-01T00:00:00Z"),
                        "lastHeartbeat" : ISODate("2016-09-08T13:19:10.553Z"),
                        "lastHeartbeatRecv" : ISODate("1970-01-01T00:00:00Z"),
                        "pingMs" : NumberLong(0),
                        "authenticated" : false,
                        "configVersion" : -1
                },
                {
                        "_id" : 2,
                        "name" : "192.168.1.169:27017",
                        "health" : 0,
                        "state" : 6,
                        "stateStr" : "(not reachable/healthy)",
                        "uptime" : 0,
                        "optime" : {
                                "ts" : Timestamp(0, 0),
                                "t" : NumberLong(-1)
                        },
                        "optimeDate" : ISODate("1970-01-01T00:00:00Z"),
                        "lastHeartbeat" : ISODate("2016-09-08T13:19:10.552Z"),
                        "lastHeartbeatRecv" : ISODate("1970-01-01T00:00:00Z"),
                        "pingMs" : NumberLong(0),
                        "authenticated" : false,
                        "configVersion" : -1
                }
        ],
        "ok" : 1
}

Why? Perhaps the shutdown is not a good way to stop mongod; however I also tested using 'kill pid', but the restart ends up in the same state.

In this status I don´t know how to repair the cluster; I have started again (removing the dbpath files and reconfiguring the replica set); I tried '--repair' but has not worked.

Info about my system:

Mongo version: 3.2
I start the process as root, perhaps it should be as 'mongod' user?
This is my start command: mongod --conf /etc/mongod.conf
keyFile configuration does not work; if I add "--keyFile /path/to/file" shows:
"about to fork child process, waiting until server is ready for connections." this file has all permissions, but it cannot use keyFile.
An example of the "net.bindIp" configuration, from mongod.conf on one machine:
```
net:
  port: 27017
  bindIp: 127.0.0.1,192.168.1.167
```

Upvotes: 7

Answers (3)

MrElephant

Reputation: 312

finally I resolved the problem, for a cluster replica set is MANDATORY a keyFile to communicate all nodes, when I indicated keyFile it returns error because in mongod.log indicated :

I ACCESS   [main] permissions on /etc/keyfile are too open

keyfile must have 400 as permission. Thanks @Saleem

When people says "You can add keyfile" I was thinking as an optional param but it is mandatory.

Upvotes: 2

Aayushi

Reputation: 9

Node should be shut down one at a time, so other secondry member will elect for primary. And it will be in recovery node while syncing to the to the other member. This one by one shutdown will not require to re add the nodes.

Upvotes: 0

Saleem

Reputation: 9008

Note: This solution is Windows specific but can be ported to *nix based systems easily.

You'll need to take steps in sequence. First of all, start your mongod instances.

start "29001" mongod --dbpath "C:\data\db\r1" --port 29001
start "29002" mongod --dbpath "C:\data\db\r2" --port 29002
start "29003" mongod --dbpath "C:\data\db\r3" --port 29003

Connect with mongo to each node and create an administrator user. I prefer creating super user.

> use admin
> db.createUser({user: "root", pwd: "123456", roles:["root"]})

You may create other users as deemed necessary.

Create key file. See documentation for valid key file contents.

Note: On *nix based systems, set chmod of key file to 400

In my case, I created key file as

echo mysecret==key > C:\data\key\key.txt

Now restart your MongoDB servers with --keyFile and --replSet flags enabled.

start "29001" mongod --dbpath "C:\data\db\r1" --port 29001 --replSet "rs1" --keyFile C:\data\key\key.txt
start "29002" mongod --dbpath "C:\data\db\r2" --port 29002 --replSet "rs1" --keyFile C:\data\key\key.txt
start "29003" mongod --dbpath "C:\data\db\r3" --port 29003 --replSet "rs1" --keyFile C:\data\key\key.txt

Once all mongod instances are up and running, connect any one with authentication.

mongo --port 29001 -u "root" -p "123456" --authenticationDatabase "admin"

Initiate replicaset,

> use admin
> rs.initiate()
> rs1:PRIMARY> rs.add("localhost:29002")
{ "ok" : 1 }
> rs1:PRIMARY> rs.add("localhost:29003")
{ "ok" : 1 }

Note: You may need to replace localhost with machine name or IP address.

Upvotes: 1

Stop replica set on mongo and primary goes into recovery status

Answers (3)

Related Questions