Reputation: 155
I read about Java leader election implementation using Zookeeper. I am clear with the algorithm described here. But I have a subtle question to ask about the algorithm.
In the algorithm explained, nodes select all the child nodes of the "/election" node and select the smallest node as the leader.
In that case, how they decide the nodes which are in and which are not in. What I want to know is what condition decides to prevent a node which is late to create its child node and to participate in the leader election. Is it a timeout? If so, how and where it is counted?
Upvotes: 2
Views: 4089
Reputation: 111
how they decide the nodes which are in and which are not in?
Any server node which have create child znode under /election is able to be a leader, and the only way to prevent a server node to be in leader election is that do not create child znode for this server node.
Upvotes: 0
Reputation: 401
When any node creates a sequential ephemeral node under /election to try to assume leadership, Zookeeper automatically assigns sequence number for the sequential ephemeral node. How does a server know it can assume leadership? It can issues getChildren to acquire the child nodes of /election and determines if the znode it just created has the smallest sequence number. If yes, then it may assume leader responsibility. If not, then it sets a watch for the znode that has the largest sequence number that's smaller than the sequence number of its znode.
For example, three servers A, B and C try to acquire leadership by creating an ephemeral znode guid-n_X, where X is the sequence number Zookeeper assigns. Let's say B makes it first and creates a znode /election/guid-n_0, followed by C (/election/guid-n_1) and A (/election/guid-n_2). Server B knows it has the znode that has the smallest sequence number after it successfully created it and calls getChildren to get the list of child nodes. The other two servers also perform the same procedure to know that they are not the leaders. They, however, set only one watch on the previous znode so it prevents herding effect and they can also know when the old leader is dead and they should assume leadership. So in this case server C sets a watch on /election/guid-n_0 and Server A sets a watch on /election/guid-n_1. When B is dead, it stops sending heartbeats to Zookeeper and its ephemeral znode gets deleted. Server C then gets notified of this event and it can act as the new leader.
Hope this answers your question.
Upvotes: 6