Reputation: 877
I have mongodb replication set with 2 node(node0, node1), one day one of it(node1) crash.
considering deleting all data of node1 and restart it will take a long time, I shutdown node0 and rsync data to node1
after that, I start node0 and node1. both replSet stuck at STARTUP2, bellow is some log:
Sat Feb 8 13:14:22.031 [rsMgr] replSet I don't see a primary and I can't elect myself
Sat Feb 8 13:14:24.888 [rsStart] replSet initial sync pending
Sat Feb 8 13:14:24.889 [rsStart] replSet initial sync need a member to be primary or secondary to do our initial sync
How to solve this problem?
Upvotes: 3
Views: 11490
Reputation: 4193
EDIT 10/29/15: I found there's actually an easier way to find back your primary by using rs.reconfig
with option {force: true}
. You can find detail document here. Use with caution though as mentioned in the document it may cause rollback.
You should never build a 2-member replica set because once one of them is down, the other one wouldn't know if it's because the other one is down, or itself has been cut off from network. As a solution, add an arbiter node for voting.
So your problem is, when you restart node0, while node1 is already dead, no other node votes to it. it doesn't know if it's suitable to run a a primary node anymore. Thus it falls back to a secondary, that's why you see the message
Sat Feb 8 13:14:24.889 [rsStart] replSet initial sync need a member to be primary or secondary to do our initial sync
I'm afraid as I know there's no other official way to resolve this issue other than rebuilding the replica set (but you can find some tricks later). Follow these steps:
Then follow mongodb manual to recreate a replica set
I'm afraid you'll have to spend a long time waiting for the sync to finish. And then you are good to go.
I strongly recommend adding an arbiter to avoid this situation again.
So, above is the official way to reolve your issue, and this is how I did it with MongoDB 2.4.8. I didn't find any document to prove it so there's absolutely NO gurantee. you do it on your own risk. Anyway, if it doesn't work for you, just fallback to the official way. Worth tryng ;)
go to local database, and delete node1 from db.system.replset. for example in my machine originally it's like:
{ "_id": "rs0", "version": 5, "members": [{ "_id": 0, "host": "node0" }, { "_id": 1, "host": "node1" }] }
You should change it to
{
"_id": "rs0",
"version": 5,
"members": [{
"_id": 0,
"host": "node0"
}]
}
That's all. Let me know if you should have any question.
Upvotes: 5
Reputation: 1664
I had the same issue when using MMS. I created a new ReplicaSet of 3 machines (2 data + 1 arbiter, which is tricky to setup on MMS btw) and they were all in STARTUP2 "initial sync need a member to be primary or secondary to do our initial sync"
myReplicaSet:STARTUP2> rs.status()
{
"set" : "myReplicaSet",
"date" : ISODate("2015-01-17T21:20:12Z"),
"myState" : 5,
"members" : [
{
"_id" : 0,
"name" : "server1.mydomain.com:27000",
"health" : 1,
"state" : 5,
"stateStr" : "STARTUP2",
"uptime" : 142,
"optime" : Timestamp(0, 0),
"optimeDate" : ISODate("1970-01-01T00:00:00Z"),
"lastHeartbeat" : ISODate("2015-01-17T21:20:12Z"),
"lastHeartbeatRecv" : ISODate("2015-01-17T21:20:11Z"),
"pingMs" : 0,
"lastHeartbeatMessage" : "initial sync need a member to be primary or secondary to do our initial sync"
},
{
"_id" : 1,
"name" : "server2.mydomain.com:27000",
"health" : 1,
"state" : 5,
"stateStr" : "STARTUP2",
"uptime" : 142,
"optime" : Timestamp(0, 0),
"optimeDate" : ISODate("1970-01-01T00:00:00Z"),
"infoMessage" : "initial sync need a member to be primary or secondary to do our initial sync",
"self" : true
},
{
"_id" : 3,
"name" : "server3.mydomain.com:27000",
"health" : 1,
"state" : 5,
"stateStr" : "STARTUP2",
"uptime" : 140,
"lastHeartbeat" : ISODate("2015-01-17T21:20:12Z"),
"lastHeartbeatRecv" : ISODate("2015-01-17T21:20:10Z"),
"pingMs" : 0
}
],
"ok" : 1
}
To fix it, I used yaoxing answer. I had to shutdown the ReplicaSet on MMS, and wait for all members to be shut. It took a while... Then, On all of them, I removed the content of the data dir:
sudo rm -Rf /var/data/*
And only after that, I turned the ReplicaSet On and all was fine.
Upvotes: 0