Mongodb Update opearation takes long time to insert new array filed in the existing collection documents

Question

I've a collection having around 2.5 million documents in it. Now I've to add two array fields in all the records in the collection. But my update operation takes long time to complete. Below is my query

db.products.update({
  "code": {
            "$nin": [
              "Tvs",
              "Lg",
              "Roots",
              "Mix",
              "A10",
              "PTPL",
              "Philips",
              "FireFox",
              "Akkade" ]
          }
         },
{
  "$push": {
    "rights": "Read",
    "Acc": "K23424"
  }
},
false,
true)

Above update takes around 2 minutes to update the entire collection.Is there any other way to optimize this update query.

Edit:

Including model explain plan result

{
    "cursor" : "BtreeCursor code_1",
    "isMultiKey" : false,
    "n" : 106192,
    "nscannedObjects" : 106192,
    "nscanned" : 106197,
    "nscannedObjectsAllPlans" : 106192,
    "nscannedAllPlans" : 106197,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 829,
    "nChunkSkips" : 0,
    "millis" : 275,
    "indexBounds" : {
        "code" : [ 
            [ 
                {
                    "$minElement" : 1
                }, 
                "adfgfgg"
            ], 
            [ 
                "5def354", 
                "akargfdc"
            ], 
            [ 
                "34aka545dc", 
                "akags"
            ], 
            [ 
                "354erak53ag345s", 
                "ci45t45r6rg"
            ], 
            [ 
                "cyuikitryui7ixsg", 
                "gp"
            ], 
            [ 
                "gp", 
                "gslbansgrp"
            ], 
            [ 
                "gsl7878nsgrp", 
                "l7hrgyn"
            ], 
            [ 
                "l678uhn", 
                "l6yup"
            ], 
            [ 
                "lfghrhyhp", 
                "radwaregslbsg"
            ], 
            [ 
                "radwaregslbsg", 
                "radwaregslbsgrs"
            ], 
            [ 
                "radwaregslbsgrs", 
                "radwarers"
            ], 
            [ 
                "rargydwarers", 
                "radwaresg"
            ], 
            [ 
                "radwargyresg", 
                "radwaresgrs"
            ], 
            [ 
                "radwaresgrs", 
                "slr6y6bsf"
            ], 
            [ 
                "slbrtrtsf", 
                {
                    "$maxElement" : 1
                }
            ]
        ]
    },
    "server" : "localhost:6789",
    "filterSet" : false,
    "stats" : {
        "type" : "FETCH",
        "works" : 106198,
        "yields" : 829,
        "unyields" : 829,
        "invalidates" : 0,
        "advanced" : 106192,
        "needTime" : 5,
        "needFetch" : 0,
        "isEOF" : 1,
        "alreadyHasObj" : 0,
        "forcedFetches" : 0,
        "matchTested" : 0,
        "children" : [ 
            {
                "type" : "IXSCAN",
                "works" : 106197,
                "yields" : 829,
                "unyields" : 829,
                "invalidates" : 0,
                "advanced" : 106192,
                "needTime" : 5,
                "needFetch" : 0,
                "isEOF" : 1,
                "keyPattern" : "{ code: 1.0 }",
                "isMultiKey" : 0,
                "boundsVerbose" : "field #0['code']: [MinKey, \"arhtgh10sg\"), (\"a1rgtrg0sg\", \"akadc\"), (\"akadc\", \"akags\"), (\"akags\", \"ctryitrrgyrtgyixsg\"), (\"crtytryityyrixsg\", \"gp\"), (\"gp\", \"gslytyybansgrp\"), (\"gstrytylbansgrp\", \"ln\"), (\"lrytryyn\", \"lyty5typ\"), (\"lty5ty5tp\", \"radwaregtryslbsg\"), (\"radwaregs454t45rgtlbsg\", \"radwaregslbsgrs\"), (\"radwa45654t6regslbsgrs\", \"radware46rs\"), (\"radwrfgarers\", \"rad456waresg\"), (\"r457423adwaresg\", \"radw34aresgrs\"), (\"ra5656dw5rty5aresgrs\", \"slbs6565656f\"), (\"slb66rty5rty5sf\", MaxKey]",
                "yieldMovedCursor" : 0,
                "dupsTested" : 0,
                "dupsDropped" : 0,
                "seenInvalidated" : 0,
                "matchTested" : 0,
                "keysExamined" : 106197,
                "children" : []
            }
        ]
    }
}

Regards, Roots

chridam · Accepted Answer

The issue is that the inequality operator $nin is not very selective since it often matches a large portion of the index. As a result, in many cases, a $nin query with an index may perform no better than a $nin query that must scan all documents in a collection. Think of it this way, indexes are very useful for finding thing which match, not which don't. For instance, a $nin query that returns every document in a collection but one and for each document, the query will first have to look in the index, and then return the document. This is two "look-ups", as opposed to a table scan, which would only look at each document once. In general, if your query requires more than half of the index to be used, you should re-examine how the query is performed, or resort to a table scan at the very least. So if you need to use $nin, it's often best to make sure that an additional, more selective criterion is part of the query. See also Query Selectivity.

You could use the Bulk API as a way of optimising your update by including an additional query that ensures selectivity, in this case the Bulk.find().update() query contains the _id field which is by default indexed and an equality match on the unique _id field is highly selective as it can match at most one document. The other key to increasing speed on updates is to note how MongoDB gives a lot of control over how database operations are acknowledged by a server.

The following example initializes a Bulk() operations builder for the products collection, and adds multi update operations to the list of operations. It uses ordered bulk operations which are stepped through in order (thus the name), halting when there's an error:

var bulk = db.products.initializeOrderedBulkOp(),
    counter = 0,
    criteria = {
        "code": {
            "$nin": [
              "Tvs",
              "Lg",
              "Roots",
              "Mix",
              "A10",
              "PTPL",
              "Philips",
              "FireFox",
              "Akkade" 
            ]
        }
    };

db.products.find().forEach(function(doc){

    bulk.find({ "_id": doc._id, "code": criteria.code }).update({
        "$push": { "rights": "Read", "Acc": "K23424" }
    })}

    counter++;
    if (counter % 1000 == 0) {
        // Execute per 1000 operations and re-initialize every 1000 update statements
        bulk.execute();
        bulk = db.collection.initializeOrderedBulkOp();
    }
})

// Clean up queues
if (counter % 1000 != 0){
    bulk.execute();
}

Mongodb Update opearation takes long time to insert new array filed in the existing collection documents

Answers (2)

Related Questions