Why is Zookeeper not re-electing new leader in Apache Nifi Cluster?

Question

Following is my architecture

2 Servers:
Server 1: running Apache Nifi + Zookeeper (Not embedded)
Server 2: running Apache Nifi + Zookeeper (Not embedded)

To test failovers, I close down the Server that has been selected as Cluster Coordinator

In this case, zookeeper should automatically elect the remaining one server as leader. But it keeps failing and goes into continuous loop of trying to connect to the first server

Zookeeper Logs in Server 2 when leader (Server 1) went down:

2019-10-22 18:44:01,135 [myid:2] - WARN  [NIOWorkerThread-2:NIOServerCnxn@370] - Exception causing close of session 0x0: ZooKeeperServer not running
2019-10-22 18:44:02,925 [myid:2] - WARN  [NIOWorkerThread-3:NIOServerCnxn@370] - Exception causing close of session 0x0: ZooKeeperServer not running
2019-10-22 18:44:03,320 [myid:2] - WARN  [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):QuorumCnxManager@677] - 
Cannot open channel to 1 at election address ec2-server-1.compute-1.amazonaws.com/172.xx.x.x:3888
    java.net.ConnectException: Connection refused (Connection refused)
            at java.net.PlainSocketImpl.socketConnect(Native Method)
            at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)

Server 2 Config files:

zoo.cfg

tickTime=2000
initLimit=5
syncLimit=2
dataDir=/home/ec2-user/zookeeper
clientPort=2181
server.1=ec2-server-1.compute-1.amazonaws.com:2888:3888
server.2=0.0.0.0:2888:3888

nifi.properties

nifi.cluster.is.node=true
nifi.cluster.node.address=ec2-server-2.compute-1.amazonaws.com
nifi.cluster.node.protocol.port=8082
nifi.cluster.flow.election.max.wait.time=2 mins
nifi.cluster.flow.election.max.candidates=1

# zookeeper properties, used for cluster management #
nifi.zookeeper.connect.string=localhost:2181
nifi.zookeeper.root.node=/nifi

Server 1 Config files:

zoo.cfg

tickTime=2000
initLimit=5
syncLimit=2
dataDir=/home/ec2-user/zookeeper
clientPort=2181
server.1=0.0.0.0:2888:3888
server.2=ec2-server-2.compute-1.amazonaws.com:2888:3888

nifi.properties

nifi.cluster.is.node=true
nifi.cluster.node.address=ec2-server-1.compute-1.amazonaws.com
nifi.cluster.node.protocol.port=8082
nifi.cluster.flow.election.max.wait.time=2 mins
nifi.cluster.flow.election.max.candidates=1

# zookeeper properties, used for cluster management #
nifi.zookeeper.connect.string=localhost:2181
nifi.zookeeper.root.node=/nifi

What am I doing wrong?

user3237183 · Accepted Answer

You need at least three nodes to be able to handle the failure of one node.

Check the Admin guide:

Clustered (Multi-Server) Setup For reliable ZooKeeper service, you should deploy ZooKeeper in a cluster known as an ensemble. As long as a majority of the ensemble are up, the service will be available. Because Zookeeper requires a majority, it is best to use an odd number of machines. For example, with four machines ZooKeeper can only handle the failure of a single machine; if two machines fail, the remaining two machines do not constitute a majority. However, with five machines ZooKeeper can handle the failure of two machines.

A simpler explanation here also

Why is Zookeeper not re-electing new leader in Apache Nifi Cluster?

Answers (1)

Related Questions