Reputation: 147
I've a collection having around 2.5 million documents in it. Now I've to add two array fields in all the records in the collection. But my update operation takes long time to complete. Below is my query
db.products.update({
"code": {
"$nin": [
"Tvs",
"Lg",
"Roots",
"Mix",
"A10",
"PTPL",
"Philips",
"FireFox",
"Akkade" ]
}
},
{
"$push": {
"rights": "Read",
"Acc": "K23424"
}
},
false,
true)
Above update takes around 2 minutes to update the entire collection.Is there any other way to optimize this update query.
Edit:
Including model explain plan result
{
"cursor" : "BtreeCursor code_1",
"isMultiKey" : false,
"n" : 106192,
"nscannedObjects" : 106192,
"nscanned" : 106197,
"nscannedObjectsAllPlans" : 106192,
"nscannedAllPlans" : 106197,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 829,
"nChunkSkips" : 0,
"millis" : 275,
"indexBounds" : {
"code" : [
[
{
"$minElement" : 1
},
"adfgfgg"
],
[
"5def354",
"akargfdc"
],
[
"34aka545dc",
"akags"
],
[
"354erak53ag345s",
"ci45t45r6rg"
],
[
"cyuikitryui7ixsg",
"gp"
],
[
"gp",
"gslbansgrp"
],
[
"gsl7878nsgrp",
"l7hrgyn"
],
[
"l678uhn",
"l6yup"
],
[
"lfghrhyhp",
"radwaregslbsg"
],
[
"radwaregslbsg",
"radwaregslbsgrs"
],
[
"radwaregslbsgrs",
"radwarers"
],
[
"rargydwarers",
"radwaresg"
],
[
"radwargyresg",
"radwaresgrs"
],
[
"radwaresgrs",
"slr6y6bsf"
],
[
"slbrtrtsf",
{
"$maxElement" : 1
}
]
]
},
"server" : "localhost:6789",
"filterSet" : false,
"stats" : {
"type" : "FETCH",
"works" : 106198,
"yields" : 829,
"unyields" : 829,
"invalidates" : 0,
"advanced" : 106192,
"needTime" : 5,
"needFetch" : 0,
"isEOF" : 1,
"alreadyHasObj" : 0,
"forcedFetches" : 0,
"matchTested" : 0,
"children" : [
{
"type" : "IXSCAN",
"works" : 106197,
"yields" : 829,
"unyields" : 829,
"invalidates" : 0,
"advanced" : 106192,
"needTime" : 5,
"needFetch" : 0,
"isEOF" : 1,
"keyPattern" : "{ code: 1.0 }",
"isMultiKey" : 0,
"boundsVerbose" : "field #0['code']: [MinKey, \"arhtgh10sg\"), (\"a1rgtrg0sg\", \"akadc\"), (\"akadc\", \"akags\"), (\"akags\", \"ctryitrrgyrtgyixsg\"), (\"crtytryityyrixsg\", \"gp\"), (\"gp\", \"gslytyybansgrp\"), (\"gstrytylbansgrp\", \"ln\"), (\"lrytryyn\", \"lyty5typ\"), (\"lty5ty5tp\", \"radwaregtryslbsg\"), (\"radwaregs454t45rgtlbsg\", \"radwaregslbsgrs\"), (\"radwa45654t6regslbsgrs\", \"radware46rs\"), (\"radwrfgarers\", \"rad456waresg\"), (\"r457423adwaresg\", \"radw34aresgrs\"), (\"ra5656dw5rty5aresgrs\", \"slbs6565656f\"), (\"slb66rty5rty5sf\", MaxKey]",
"yieldMovedCursor" : 0,
"dupsTested" : 0,
"dupsDropped" : 0,
"seenInvalidated" : 0,
"matchTested" : 0,
"keysExamined" : 106197,
"children" : []
}
]
}
}
Regards, Roots
Upvotes: 1
Views: 3957
Reputation: 117
For anyone facing an issue like this currently, a good way to decrease the time drastically is to create indexes for your collections.
Upvotes: 0
Reputation: 103435
The issue is that the inequality operator $nin
is not very selective since it often matches a large portion of the index. As a result, in many cases, a $nin
query with an index may perform no better than a $nin
query that must scan all documents in a collection. Think of it this way, indexes are very useful for finding thing which match, not which don't. For instance, a $nin
query that returns every document in a collection but one and for each document, the query will first have to look in the index, and then return the document. This is two "look-ups", as opposed to a table
scan, which would only look at each document once. In general, if your query requires more than half of the index to be used, you should re-examine how the query is performed, or resort to a table scan at the very least. So if you need to use $nin
, it's often best to make sure that an additional, more selective criterion is part of the query. See also Query Selectivity.
You could use the Bulk API as a way of optimising your update by including an additional query that ensures selectivity, in this case the Bulk.find().update()
query contains the _id
field which is by default indexed and an equality match on the unique _id
field is highly selective as it can match at most one document. The other key to increasing speed on updates is to note how MongoDB gives a lot of control over how database operations are acknowledged by a server.
The following example initializes a Bulk()
operations builder for the products collection, and adds multi update operations to the list of operations. It uses ordered bulk operations which are stepped through in order (thus the name), halting when there's an error:
var bulk = db.products.initializeOrderedBulkOp(),
counter = 0,
criteria = {
"code": {
"$nin": [
"Tvs",
"Lg",
"Roots",
"Mix",
"A10",
"PTPL",
"Philips",
"FireFox",
"Akkade"
]
}
};
db.products.find().forEach(function(doc){
bulk.find({ "_id": doc._id, "code": criteria.code }).update({
"$push": { "rights": "Read", "Acc": "K23424" }
})}
counter++;
if (counter % 1000 == 0) {
// Execute per 1000 operations and re-initialize every 1000 update statements
bulk.execute();
bulk = db.collection.initializeOrderedBulkOp();
}
})
// Clean up queues
if (counter % 1000 != 0){
bulk.execute();
}
Upvotes: 1