Reputation: 11356
Imagine a collection with about 5,000,000 documents. I need to do a basicCursor
query to select ~100 documents based on too many fields to index. Let's call this the basicCursorMatch
. This will be immensely slow.
I can however to a bTreeCursor
query on a few indexes that will limit my search to ~500 documents. Let's call this query the bTreeCursorMatch
.
Is there a way I can do this basicCursorMatch
directly on the cursor or collection resulting from the bTreeCursorMatch
?
Intuitively I tried
var cursor = collection.find(bTreeCursorMatch);
var results = cursor.find(basicCursorMatch);
similar to collection.find(bTreeCursorMatch).find(basicCursorMatch)
, which doesn't seem to work.
Alternatively, I was hoping I could do something like this:
collection.aggregate([
{$match: bTreeCursorMatch}, // Uses index 5,000,000 -> 500 fast
{$match: basicCursorMatch}, // No index, 500 -> 100 'slow'
{$sort}
]);
.. but it seems that I cannot do this either. Is there an alternative to do what I want?
The reason I am asking is because this second query will differ a lot and there is no way I can index all the fields. But I do want to make that first query using a bTreeCursor
, otherwise querying the whole collection will take forever using a basicCursor
.
update
Also, through user input the subselection of 500 documents will be queried in different ways during a session with an unpredictable basicCursor
query, using multiple $in
$eq
$gt
$lt
. But during this, the bTreeCursor
subselection remains the same. Should I just keep doing both queries for every user query, or is there a more efficient way to keep a reference
to this collection?
Upvotes: 3
Views: 2836
Reputation: 4928
In practice, you rarely need to run second queries on a cursor. You specially don't need to break MongoDB's work into separate indexable / non-indexable chunks.
If you pass a query to MongoDB's find
method that can be partially fulfilled by a look-up in an index, MongoDB will do that look-up first, and then do a full scan on the remaining documents.
For instance, I have a collection users
with documents like:
{ _id : 4, gender : "M", ... }
There is an index on _id
, but not on gender. There are ~200M documents in users
.
To get an idea of what MongoDB is doing under the hood, add explain()
to your cursor (in the Mongo shell):
> db.users.find( { _id : { $gte : 1, $lt : 10 } } ).explain()
{
"cursor" : "BtreeCursor oldId_1_state_1",
"n" : 9,
"nscannedObjects" : 9
}
I have cut out some of the fields returned by explain
. Basically, cursor tells you if it's using an index, n
tells you the number of documents returned by the query and nscannedObjects
is the number of objects scanned during the query. In this case, mongodb was able to scan exactly the right number of objects.
What happens if we now query on gender as well?
> db.users.find( { _id : { $gte : 1, $lt : 10 }, gender : "F" } ).explain()
{
"cursor" : "BtreeCursor oldId_1_state_1",
"n" : 5,
"nscannedObjects" : 9
}
find
returns 5 objects, but had to scan 9 documents. It was therefore able to isolate the correct 9 documents using the _id
field. It then went through all 9 documents and filtered them by gender.
Upvotes: 1