Reputation: 52957
In order to convince oneself that the complications of standard algorithms such as Paxos and Raft are necessary, one must understand why simpler solutions aren't satisfactory. Suppose that, in order to reach consensus w.r.t a stream of events in a cluster of N machines (i.e., implement a replicated time-growing log), the following algorithm is proposed:
Whenever a machine wants to append a message to the log, it broadcasts the tuple (msg, rnd, prev)
, where msg
is the message, rnd
is a random number, and prev
is the ID of the last message on the log.
When a machine receives a tuple, it inserts msg
as a child of prev
, forming a tree.
If a node has more than one child, only the one with highest rnd
is considered valid; the path of valid messages through the tree is the main chain.
If a message is part of the main chain, and it is old enough, it is considered decided/final.
If a machine attempts to submit a message and, after some time, it isn't present on the main chain, that means another machine broadcasted a message at roughly the same time, so you re-broadcast it until it is there.
Looks simple, efficient and resilient to crashes. Would this algorithm work?
Upvotes: 0
Views: 284
Reputation: 76
I think you have a problem if a machine send two tuple in sequence and the first gets lost (package loss/corruption or whatever)
In that case, lets say machine 1 has prev elemtent id of 10 and sends two more with (msg,rnd,10)=11 and (msg,rnd,11)=12 to machine 2.
Machine 2 only receives (msg,rnd,11) but does not have prev id of 11 in its tree. Machine 3 receives both, so inserts it into the main tree.
At this time you would have a desync beetween the distributed trees.
I propose an ack for the packages after they are inserted in the tree by machine x to the sender, with him waiting for it to send the next.
This way sender needs to resend previous message to the machines that failed to ack in a given timeframe.
Upvotes: 1