inteloid
inteloid

Reputation: 737

MongoDB writeConcern for production

My question may sound too general, but I'm ready to give any missing data.

We make something like a social network. In order to make read performance better and to ease the life of master instance, we've set

readPreference=secondaryPreferred

in our replicaSet. But with this, there's no guarantee that the data is written to secondary instances before you read from there, so we had to set

w=3

option. So far, everything seems to be working but measurements on my local replicaSet show the following insert statistics.

Inserting 300 objects:
w=1 - 0.10s
w=3 - 1.31s
Insertion 5000 objects:
w=1 - 0.6s
w=3 - 14.6s

The question is, is this difference expected, or I'm doing something wrong?

Upvotes: 1

Views: 568

Answers (1)

Dylan Tong
Dylan Tong

Reputation: 657

The difference in performance is expected because w=3 means that you want to wait for acknowledgement that data was successfully replicated to at least two of your secondaries in addition to the acknowledgement from your primary (w=1).

For clarity, w = 1 simply means that you want an acknowledgement from the primary that an operation was completed. Any errors such as duplicate key errors or network errors that occur would be reported back as part of the acknowledgement if occurred.

http://docs.mongodb.org/manual/reference/write-concern/

Refer to the link above, and you can see there are lower write concerns that let you trade safety for lower latency.

If you want higher level of durability or safety, then you might use j=1 to wait for an acknowledgement that your operation was written to the journal (allowing recovery from a failure). w > N increases safety by waiting for acknowledgement from > N replica members to ensure that your operation was successfully replicated to other members. So to be clear, w > 1 isn't necessary to instruct the driver to write to the replicas. If you decided to use w=N, be aware that you can get yourself in a bad situation if replica set members fail and fall below N. w = majority is a more flexible option.

Lastly, you may want to re-evaluate why you're reading from the secondaries. Secondaries are eventually consistent as MongoDB uses asynchronous replication. If you're expecting consistent reads, then it makes more sense to read from the primary. If your reason to read from the secondary is for scaling, you should consider sharding as this is the primary mechanism for scale-out. Distributing load on secondaries rarely improve scalability. Operations are replicated over to replicas, so you're not gaining much from a lower write load. Sometimes it makes sense to for distributing different types of workloads (may lead to better memory utilization). For instance, running a MR job on a secondary might make sense. Replica sets are primarily for high availability-- fault tolerance providing automatic fail-overs and network partition issues.

Upvotes: 3

Related Questions