DmitrySemenov
DmitrySemenov

Reputation: 10325

ZooKeeper cluster of 2 nodes - strange behavior when one node is down programmatically

When I have two nodes operational, then everything works as expected

[dmitry@zk2-prod]/etc/supervisor.d% sudo /opt/zookeeper/bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /opt/zookeeper/bin/../conf/zoo.cfg
Mode: leader

however as soon as I stop one of the nodes zk1-prod (via supervisord's supervisorctl)

[dmitry@zk2-prod]/etc/supervisor.d% sudo /opt/zookeeper/bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /opt/zookeeper/bin/../conf/zoo.cfg
Error contacting service. It is probably not running

hoewever

[dmitry@zk2-prod]/etc/supervisor.d% sudo supervisorctl status
zookeeper                        RUNNING   pid 4838, uptime 0:04:01

As soon as I bring the slave back - I'm immediately get first output (mode: leader)

[dmitry@zk2-prod]/etc/supervisor.d% ps aufx G zoo
89:zookeep+  4838  0.2  1.4 2970424 56816 ?       Sl   19:32   0:00  \_ java -Dzookeeper.log.dir=. -Dzookeeper.root.logger=INFO,CONSOLE -cp /opt/zookeeper/bin/../build/classes:/opt/zookeeper/bin/../build/lib/*.jar:/opt/zookeeper/bin/../lib/slf4j-log4j12-1.6.1.jar:/opt/zookeeper/bin/../lib/slf4j-api-1.6.1.jar:/opt/zookeeper/bin/../lib/netty-3.10.5.Final.jar:/opt/zookeeper/bin/../lib/log4j-1.2.16.jar:/opt/zookeeper/bin/../lib/jline-0.9.94.jar:/opt/zookeeper/bin/../zookeeper-3.4.10.jar:/opt/zookeeper/bin/../src/java/lib/*.jar:/opt/zookeeper/bin/../conf: -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.local.only=false org.apache.zookeeper.server.quorum.QuorumPeerMain /opt/zookeeper/bin/../conf/zoo.cfg

Do I need 3 instances at least so org.apache.zookeeper.server.quorum.QuorumPeerMain can select a leader?

I thought one instance will be able to select itself as a leader and continue serve requests.

Am I missing something?

Upvotes: 1

Views: 587

Answers (1)

franklinsijo
franklinsijo

Reputation: 18270

Do I need 3 instances at least so org.apache.zookeeper.server.quorum.QuorumPeerMain can select a leader?

Yes, to tolerate the event of losing one server.

In a Zookeeper quorum, as long as majority of Servers are available the zookeeper service will be available. A server cannot elect itself as a leader.

In this case where 2 servers form the ensemble, 2 is the majority. When one is lost, the majority making member is lost along with it. Losing the majority is considered as the failure of the quorum.

A much easier to explain 3 server scenario, If one is lost still 2 remains to maintain the majority but in the event of losing 2, the majority making member in this 3-member quorum is lost which will lead to unavailability of zookeeper service.

Upvotes: 1

Related Questions