tuk
tuk

Reputation: 6872

Zookeeper - Error the current epoch, is older than the last zxid

I am using a zookeeper ensemble of 3 nodes running 3.4.13. Sometimes after reboot of machine zookeeper is not starting in one of the node and I am seeing the below errors in logs

2019-08-19 04:18:36,906 [myid:2] - ERROR [main:QuorumPeer@692] - Unable to load database on disk
java.io.IOException: The current epoch, 7, is older than the last zxid, 34359738370
at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:674)
at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:635)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:170)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:114)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:81)
2019-08-19 04:18:36,908 [myid:2] - ERROR [main:QuorumPeerMain@92] - Unexpected exception, exiting abnormally
java.lang.RuntimeException: Unable to run quorum server
at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:693)
at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:635)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:170)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:114)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:81)
Caused by: java.io.IOException: The current epoch, 7, is older than the last zxid, 34359738370
at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:674)
... 4 more----

I have seen ZOOKEEPER-2354 and the symptoms look similar.

support@platform2:/var/lib/zookeeper/version-2$ sudo cat acceptedEpoch
8support@platform2:/var/lib/zookeeper/version-2$ sudo cat currentEpoch
7support@platform2:/var/lib/zookeeper/version-2$ sudo cat currentEpoch.tmp
8support@platform2

The above issue states the issue is fixed in 3.4.6 but I am observing the same in 3.4.13.

Can someone let me know how can I recover the zookeeper node from this?

Upvotes: 5

Views: 5440

Answers (1)

tuk
tuk

Reputation: 6872

This has been discussed in zookeeper mailing thread. Relevant quote from that thread

With the other two zookeeper servers running I stopped the zookeeper in the broken node and the deleted all the contents inside /var/lib/zookeeper/version-2 and started the zookeeper back on the node. It is running fine now and got all the data from the other servers.

Upvotes: 5

Related Questions