Pierre-Louis Gottfrois
Pierre-Louis Gottfrois

Reputation: 17631

Mongodb: How to avoid locking on big collection updates

I have an events collection of 2.502.011 elements and would like to perform an update on all elements. Unfortunately I facing a lot of mongodb faults due to the write lock.

Question: How can I avoid those faults in order to be sure that all my events are correctly updated ?

Here are the informations regarding my events collection:

> db.events.stats()
{
    "count" : 2502011,
    "size" : 2097762368,
    "avgObjSize" : 838.4305136947839,
    "storageSize" : 3219062784,
    "numExtents" : 21,
    "nindexes" : 6,
    "lastExtentSize" : 840650752,
    "paddingFactor" : 1.0000000000874294,
    "systemFlags" : 0,
    "userFlags" : 0,
    "totalIndexSize" : 1265898256,
    "indexSizes" : {
        "_id_" : 120350720,
        "destructured_created_at_1" : 387804032,
        "destructured_updated_at_1" : 419657728,
        "data.assigned_author_id_1" : 76053152,
        "emiting_class_1_data.assigned_author_id_1_data.user_id_1_data.id_1_event_type_1" : 185071936,
        "created_at_1" : 76960688
    }
}

Here is what an event look like:

> db.events.findOne()
{
  "_id" : ObjectId("4fd5d4586107d93b47000065"),
  "created_at" : ISODate("2012-06-11T11:19:52Z"),
  "data" : {
    "project_id" : ObjectId("4fc3d2abc7cd1e0003000061"),
    "document_ids" : [
      "4fc3d2b45903ef000300007d",
      "4fc3d2b45903ef000300007e"
    ],
    "file_type" : "excel",
    "id" : ObjectId("4fd5d4586107d93b47000064")
  },
  "emiting_class" : "DocumentExport",
  "event_type" : "created",
  "updated_at" : ISODate("2013-07-31T08:52:48Z")
}

I would like to update each event to add 2 new fields base on the existing created_at and updated_at. Please correct me if I am wrong but it seems you can't use the mongo update command when you need to access current's element data along the way.

This is my update loop:

db.events.find().forEach(
  function (e) {
    created_at = new Date(e.created_at);
    updated_at = new Date(e.updated_at);

    e.destructured_created_at = [e.created_at]; // omitted the actual values
    e.destructured_updated_at = [e.updated_at]; // omitted the actual values
    db.events.save(e);
  }
)

When running the above command, I get a huge amount of page faults due to the write lock on the database.

mongostat

Upvotes: 4

Views: 2852

Answers (1)

Sammaye
Sammaye

Reputation: 43884

I think you are confused here, it is not the write lock causing that, it is MongoDB querying for your update documents; the lock does not exist during a page fault (in fact it only exists when actually updating, or rather saving, a document on the disk), it gives way to other operations.

The lock is more of a mutex in MongoDB.

Page faults on this size of data is perfectly normal, since you obviously do not query this data often, I am unsure what you are expecting to see. I am definitely unsure what you mean by your question:

Question: How can I avoid those faults in order to be sure that all my events are correctly updated ?

Ok, the problem you may be seeing is that you are getting page thrashing on that machine in turn destroying your IO bandwidth and flooding your working set with data that is not needed. Do you really need to add this field to ALL documents eagerly, can it not be added on-demand by the application when that data is used again?

Another option is to do this in batches.

One feature you could make use of here is priority queues that dictate that such an update is a background task that shouldn't effect the current workings of your mongod too much. I hear such a feature is due (can't find JIRA :/).

Please correct me if I am wrong but it seems you can't use the mongo update command when you need to access current's element data along the way.

You are correct.

Upvotes: 6

Related Questions