MongoDB consistency in case master goes down

Question

Suppose we have 3-node replica set (1m,2s,3s). We are going to apply few sequential updated on the same document:

update1 {writeConcern: "majority"} -> suppose we updated 1m and 2s nodes appropriately.
update2 {writeConcern: "majority"} -> suppose we updated 1m and 3s nodes
shutdown master immediately! (assume Oplog has not been fully synched yet)

Questions:

Which node going to elected as primary?
What's the state of document going to be read on right next find operation (readConcern = "majority" / "local")
what happens when the master comes back in such case?

kevinadi · Accepted Answer

Steps 1 and 2 are mutually exclusive. One of them will happen, but not both. This is because MongoDB's oplog is sequential in nature.

To elaborate, let's assume step 1 and step 2 are sequential in time.

After step 1, the oplog contains: Start -> update1

After step 2, the oplog contains: Start -> update1 -> update2

After step 3, whichever node contains the latest data (could be update1 only, or both update1 and update2, it doesn't matter) will be elected primary.

The next read into the set will return the latest update that managed to get replicated before the primary went offline. Of course, it is entirely possible that neither updates get replicated, giving the next read the Start state. The point is, the set will never get confused regarding the sequence of updates.

If the new primary contains only update1 (e.g. update2 did not manage to get replicated), then when the old primary comes back online, it will go into the ROLLBACK state, where it will remove all traces of update2. Thus, it is possible to lose update2 if the timing is just right.

To avoid rollbacks in most cases, you would need to perform your writes with write concern majority, so that update2 will be replicated to the majority of voting nodes, minimizing the chance that it will be rolled back.

The scenario you described is similar to a "split brain" scenario, where it's impossible to tell which update is the correct one. MongoDB replica set protocol was especially designed to avoid this situation.

MongoDB consistency in case master goes down

Answers (1)

Related Questions