silentnights
silentnights

Reputation: 931

Paxos leader election might not terminate

I am studying the Paxos made simple paper, and I am struggling with the fact that Paxos does not guarante progress if 2 proposers are racing each other with higher proposal numbers, and as suggested in the paper, to guarantee progress, a distinguished proposer must be selected which makes him a leader.

But here comes the issue as people are suggesting to use Paxos to elect the distinguished proposer, which again requires a leader to guarantee progress.

I understand that the scenario given might be an implementation specific, for example, if the distinguished sets given to processes to chose from are ordered, I mean P1 set < P2 set.

But I want to understand in actual implementations how is this handled?

Upvotes: 2

Views: 527

Answers (1)

simbo1905
simbo1905

Reputation: 6862

The usual approach is to simply use randomised timeouts where there is a low probably of an extended leader duel. If you search for "timeout" in the paper then it mentions this.

If it takes on average X seconds for a stable leader to emerge and make progress (which we can estimate using the minimum number of message round trips) then we can simply have each node timeout at random within some interval that is a multiple of X. By using a new random number at each attempt to become a leader we have a low probability of an extended leader duel.

If we set a larger multiple of X as the upper bound of the random timeout we have a lower probability of an extended leader duel. Yet we also have a longer average time before a leader emerges. So it is a trade-off.

If an implementation needs a very fast failover we may use a low timeout random interval but try to implement a mechanism by which a leader duel resolves quickly. You can invent any arbitrary mechanism.

A simple mechanism to ensure one node has the advantage to become the leader is as follows. Every node has a unique number used to order it's ballots. During a leader duel, each node can use an exponential back-off which is scaled by its own unique number. For example, if the node number is N at each failed attempt to become the leader we can multiply the upper bound of its timeout window by 1+1/N. This means that during any duel the node with the highest N will be more aggressive in attempting to become a leader as the other nodes will back-off faster.

Upvotes: 2

Related Questions