Reputation: 9624
So, we use MongoDB at our workplace to store certain information about our customers in a collection named customers
. For an ad-hoc task, I am required to iterate through the entire collection and do some processing on each document, which means that it is critical to scan through every document in the collection without missing any.
This is the query I am running -
db.customers.find({}, {"cid":1, "name":1})
The customers
collection has an index on the cid
field, and this is the result of execution-stats on the query -
"executionStages" : {
"stage" : "PROJECTION",
"nReturned" : 19841,
"executionTimeMillisEstimate" : 10,
"works" : 19843,
"advanced" : 19841,
"needTime" : 1,
"needYield" : 0,
"saveState" : 155,
"restoreState" : 155,
"isEOF" : 1,
"invalidates" : 0,
"transformBy" : {
"cid" : 1,
"name":1
},
"inputStage" : {
"stage" : "COLLSCAN",
"nReturned" : 19841,
"executionTimeMillisEstimate" : 0,
"works" : 19843,
"advanced" : 19841,
"needTime" : 1,
"needYield" : 0,
"saveState" : 155,
"restoreState" : 155,
"isEOF" : 1,
"invalidates" : 0,
"direction" : "forward",
"docsExamined" : 19841
}
}
The issue I am facing is that when I run this query, MongoDB doesn't include a few cid
s in the cursor, which should ideally be present. Those cid
s where part of the collection before the query started running. When I run the same query again at a later date, it so happens that these documents are returned, but some other documents go missing.
From what I got from reading up before asking this question, it looks like Reads may miss matching documents that are updated during the course of the read operation in MongoDB. The article seems to hint that this, however, happens only when the query uses an index and not during an entire collection scan, which is what I am doing. My query doesn't seem to use any index so I expect to not run into this issue. However, this does happen in my case as well.
So, two questions:
customers
collection without missing any of them?Thanks
Upvotes: 3
Views: 227
Reputation: 2452
The article you reference mentions that if scanning over the whole collection, writes may change a document and cause a re-order of the documents of the collection if a document grows and needs to be moved. The author's solution is to use an index that will ensure no documents are missed in the cursor iteration. Thus, "natural order" can be volatile during iteration.
I suggest using a stable index for the scan. In your case,
db.customers.find({}, {"cid":1, "name":1}).hint({cid: 1})
will result in an index scan being the query planner's winning plan (Confirm with db.customers.find({}, {"cid":1, "name":1}).hint({cid: 1}).explain()
).
Upvotes: 2