Reputation: 10773
I have the following query about implementation RAFT:
Consider the following scenario\implementation:
One of the implementations of RAFT (link: https://github.com/peterbourgon/raft/) seems to implement it this way. I wanted to confirm if this fine.
Is it OK if entries are maintained "in memory" by the leader and the followers until it is committed? In what circumstances might this scenario fail?
Upvotes: 4
Views: 1352
Reputation: 447
I think entries should be durable before commiting.
Let's take the Figure 8(e) of the Raft extended paper as an example. If entries are durable when committed, then:
As a result, the commited entries 2 and 4 are lost. So I think the un-commited entries should be also durable.
Upvotes: 0
Reputation: 1405
I disagree with the accepted answer.
A disk isn't mystically durable. Assuming the disk is local to the server it can permanently fail. So clearly writing to disk doesn't save you from that. Replication is durability provided that the replicas live in different failure domains which if you are serious about durability they will be. Of course there are many hazards to a process that disks don't suffer from (linux oom killed, oom in general, power etc), but a dedicated process on a dedicated machine can do pretty well. Especially if the log store is say ramfs, so process restart isn't an issue.
If log storage is lost then host identity should be lost as well. A,B,C identify logs. New log, new id. B "rejoining" after (potential) loss of storage is simply a buggy implementation. The new process can't claim the identity of B because it can't be sure that it has all the information that B had. Just like in the case of always flushing to disk if we replaced the disk of the machine hosting B we couldn't just restart the process with it configured to have B's identity. That would be nonsense. It should restart as D in both cases then ask to join the cluster. At which point the problem of losing committed writes disappears in a puff of smoke.
Upvotes: 4
Reputation: 10773
I found the answer to the question by posting to raft-dev google group. I have added the answer for reference.
Please reference: https://groups.google.com/forum/#!msg/raft-dev/_lav2NeiypQ/1QbUB52fkggJ
Quoting Diego's answer:
For safety even in the face of correlated power outages, a majority of servers needs to have persisted the log entry before its effects are externalized. Any less than a majority and those servers could permanently fail, resulting in data loss/corruption
Quoting from Ben Johnson's answer to my email regarding the same:
No, a server has to flush entries to disk before being considered part of the quorum.
For example, let's say you have a cluster of nodes called A, B, & C where A is the leader.
Node A replicates an entry to Node B.
Node B stores entry in memory and responds to Node A.
Node A now has a quorum and commits the entry.
Node A then gets partitioned away from Node B & C.
Node B then dies and loses the in-memory copy of the entry.
Node B comes back up.
When Node B & C then go to elect a leader, the "committed" entry will not be in their log.
When Node A rejoins the cluster, it will have an inconsistent log. The entry will have been committed and applied to the state machine so it can't be rolled back.
Ben
Upvotes: 3