Van Yu
Van Yu

Reputation: 129

distributed storage: why the redundant copy is 3 by default instead of 2?

In distributed storage, to avoid data disasters, we need multiple copies of data. However, why the total copy quantity is preferred as 3 by default instead of 2?

Two copies will save nearly 50% storage requirements.

What's the main reason of choosing 3 copies?

Upvotes: 1

Views: 454

Answers (3)

viki.omega9
viki.omega9

Reputation: 335

Adding to Michael's answer in this question, three is chosen because it provides a very simple level of fault tolerance. This is called 't fault-tolerance' in the presence of Byzantine faults, where t is 1. That is at most 1 of those data copies can go stale/corrupt/wrong without bringing down the system.

t is usually chosen before hand as an SLA for the system in question, or via empirical evidence. Given a value of t one needs 2*t+1 copies to handle fault tolerance.

Upvotes: 0

Michael Deardeuff
Michael Deardeuff

Reputation: 10697

When using two copies of data, and they differ which version do you choose? The third acts as a tie breaker.

As to why they would differ, if one computer were down for a bit—or even if they can't talk to each other—their data would differ unless the system stops accepting writes. With three computers, though, if one is down or separated from the others, the other two can still accept data without fear of the scenario in the first paragraph. (Unless you have correlated failures, which you should still plan for.)


Update. Generally you'll find that distributed algorithms use a Quorum-based system for ensuring writes. In most it's a simple majority, meaning that at least ceil(n/2) of the nodes must have the value before it is durably written. After that, you are guaranteed that nothing can un-write the value because you cannot get ceil(n/2) more nodes to oust the decision. In a two-node system ceil(n/2) = 2; so if one of the nodes goes down, you cannot accept a write anymore. But in a three node system, ceil(n/2) = 2 still, so one node can go down and the system can still accept writes.

Really it's a question of durability vs cost vs latency. The more nodes you throw at your system, the more likely you'll not lose data. One node is fairly ephemmeral; two nodes slightly less ephemeral. Three nodes is pretty good, and many systems stop there. But systems that need higher durability will have 5, 7, or 9 nodes required.

I work on one of the most reliable systems on the internet and we use 5 nodes in the quorum with up to 16 more nodes as hot backups. For us the cost is little compared to the required durability; we chose to use 5 nodes in the quorum for latency sake with the backups for a little boost in durability and to take some read pressure of the quorum.

Upvotes: 4

Grogi
Grogi

Reputation: 2255

Because cost increase is not that significant compared to significant improvement in redundancy.

Upvotes: 0

Related Questions