oatcrunch
oatcrunch

Reputation: 453

Requires simple explanation on Arbiter's role in a given mongoDB replica set

I came across MongoDB official site explaining on having odd number of members replica set up. I also heard of the term Arbiter from the same site, which based on my understanding, it will not be elected as primary and it does participate on election (from https://docs.mongodb.com/manual/core/replica-set-arbiter/).

There is also a post related to Arbiter in Why do we need an 'arbiter' in MongoDB replication? which then relates to CAP theorem, which further gets things more complicated.

First of all, why do we need to make the number of members odd? Also, can someone explain to me what this Arbiter is and what is its role in a given replica set in simple layman English??

Thanks in advance.

Upvotes: 3

Views: 2034

Answers (2)

Vince Bowdren
Vince Bowdren

Reputation: 9208

In short: it is to stop the two normal nodes of the replica set getting into a split-brain situation if they lose contact with each other.

MongoDB replica sets are designed so that, if one or more members goes down or loses contact, the other members are able to keep going as long as between them they have a majority. The majority clause is important: without that, you might have a situation where the network is split in two, and the nodes on each side of the partition think that they're still carrying on the replica set, and end up with different sets of data.

So to avoid the split brain problem, the nodes of a replica set will not continue if they can't command an absolute majority. An example of this is if you have two nodes, in a replica set like this:

2-node replica set in normal running

If they lose communication, the outcome is symmetrical:

2-node replica set after network partition

Each one will reason the same way:

  • realise it has lost communication with the other
  • assess whether it is possible to keep the replica set going
  • realise that 1 node (out of 2) does not constitute a majority
  • revert to Secondary mode

The difference an Arbiter makes

If there is a third node, then even if the two main nodes lose contact with each other then there will still be one of them in contact with the arbiter. This allows the two main nodes to make different decisions, and keep the replica set going while avoiding the split-brain problem.

Consider the following example of a 3-node replica set:

3-member replica set in normal running

Whichever way the network partition goes, one node will still be in contact with the arbiter; for example like this:

Network partition

Node A will:

  • realise it can contact neither node B nor the arbiter
  • assess whether it is possible to keep the replica set going
  • realise that 1 node (out of 3) does not constitute a majority
  • revert to Secondary mode

Whereas node B is able to react differently:

  • realise it cannot contact node A, but still has contact with the arbiter
  • assess whether it is possible to keep the replica set going
  • realise that 2 nodes (out of 3) do constitute a majority
  • take over as Primary

This also illustrates how you should deploy an arbiter to get that benefit:

  1. try to put the arbiter on a system independent of both the data-bearing nodes, to maximise the chance of it still being able to communicate with either throughout network problems
  2. it doesn't need to store data, so you don't need high-spec hardware
  3. Just 1 arbiter is enough to break the deadlock; you don't get any benefit from multiple arbiters

Upvotes: 5

Treefish Zhang
Treefish Zhang

Reputation: 1161

Take the example of a 2-member replica set: in the event of a network-partitioning, i.e., the 2 members lost touch of each other, who gets to become the primary? There will be a tie and a need for a tie-breaker. That would not be the case if we have a 3-member replica set: the group that contains two nodes will win and one of them will become primary. That is the basis of the requirement for an odd number of nodes in a replica set. As for an arbiter, it happens to be light weight so that I guess one can save money by having in place a smaller machine, since we do not expect it to hold any data, and that we just need it to be present to vote for primary.

Upvotes: 2

Related Questions