Reputation: 6872
I am using a zookeeper ensemble of 3 nodes running 3.4.13. Sometimes after reboot of machine zookeeper is not starting in one of the node and I am seeing the below errors in logs
2019-08-19 04:18:36,906 [myid:2] - ERROR [main:QuorumPeer@692] - Unable to load database on disk
java.io.IOException: The current epoch, 7, is older than the last zxid, 34359738370
at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:674)
at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:635)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:170)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:114)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:81)
2019-08-19 04:18:36,908 [myid:2] - ERROR [main:QuorumPeerMain@92] - Unexpected exception, exiting abnormally
java.lang.RuntimeException: Unable to run quorum server
at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:693)
at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:635)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:170)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:114)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:81)
Caused by: java.io.IOException: The current epoch, 7, is older than the last zxid, 34359738370
at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:674)
... 4 more----
I have seen ZOOKEEPER-2354 and the symptoms look similar.
support@platform2:/var/lib/zookeeper/version-2$ sudo cat acceptedEpoch
8support@platform2:/var/lib/zookeeper/version-2$ sudo cat currentEpoch
7support@platform2:/var/lib/zookeeper/version-2$ sudo cat currentEpoch.tmp
8support@platform2
The above issue states the issue is fixed in 3.4.6 but I am observing the same in 3.4.13.
Can someone let me know how can I recover the zookeeper node from this?
Upvotes: 5
Views: 5440
Reputation: 6872
This has been discussed in zookeeper mailing thread. Relevant quote from that thread
With the other two zookeeper servers running I stopped the zookeeper in the broken node and the deleted all the contents inside
/var/lib/zookeeper/version-2
and started the zookeeper back on the node. It is running fine now and got all the data from the other servers.
Upvotes: 5