machunter
machunter

Reputation: 967

Why does a mongodb replica set need an odd number of voting members?

If find the replica set requirement a bit confusing, and I'm probably missing something obvious (like under which condition there are elections).

I understand that in normal operations you need quorum, and a voting takes place and to get a majority you need and odd numbers of machines.

But since we use a replica set for failover, if the master dies, then we are left with an even number of voting members, which based on my limited experience lengthen the time to elect a primary.

Also according to the documentation, the addition of a voting member doesn't start an election, it would seem that starting (booting) you replica set with an even number of nodes would make more sense?

So if we start say with 4 machines in the replica set, and one machine dies, there is a re-election with 3 machines, fast quorum. We add a machine back to get back to our normal operation state, no re-election and we are back to our normal operation conditions.

Can someone shed a light on this?

Upvotes: 4

Views: 1182

Answers (1)

mnemosyn
mnemosyn

Reputation: 46301

TL;DR: With single master systems, even partitions make it impossible to determine which remainder still has a majority, taking both systems down.

Let N be a cluster of four machines:

  • One machine dies, the others resume operation. Good.
  • Two machines die, we're offline because we no longer get a majority. Bad.

Let M be a cluster of three machines:

  • One machine dies, the others resume operation. Good.
  • Two machines die, we're offline because we no longer get a majority. Bad.

=> Same result at 3/4 of the cost.

Now, let's add an assumption or two:

  • We're also going to operate some kind of server application that uses the database
  • The network can be partitioned

Let's say you have two datacenters, one with two database instances and the backend server machines. If the connection to the backup center (which has one MongoDB instance) fails, you're still online.

Now if you added a second MongoDB instance at the backup data center, a network partition would, despite seemingly higher redundancy, yield lower availability since we'd lose the majority in case of a network partition and can't continue to operate.

=> Less availability at higher cost. But that doesn't answer the question yet.

Let's say you're really worried about availability: You have two data centers, with backend servers in both datacenters, anycast IPs, the whole deal. Now the network between the two DCs is partitioned, but some clients connect to DC A while other reach DC B. How do you now determine which datacenter may accept writes? It's not possible - this is why the odd number is necessary.

You don't actually need Anycast IPs, BGP or any fancy stuff for the problem to become real, any writing application (like a worker, a stale request, anything) would require later merging different writes, which is a completely different concurrency scheme.

Upvotes: 1

Related Questions