Redsandro
Redsandro

Reputation: 11356

Can I do a second 'query' on a MongoDB cursor?

Imagine a collection with about 5,000,000 documents. I need to do a basicCursor query to select ~100 documents based on too many fields to index. Let's call this the basicCursorMatch. This will be immensely slow.

I can however to a bTreeCursor query on a few indexes that will limit my search to ~500 documents. Let's call this query the bTreeCursorMatch.

Is there a way I can do this basicCursorMatch directly on the cursor or collection resulting from the bTreeCursorMatch?

Intuitively I tried

var cursor = collection.find(bTreeCursorMatch);
var results = cursor.find(basicCursorMatch);

similar to collection.find(bTreeCursorMatch).find(basicCursorMatch), which doesn't seem to work.

Alternatively, I was hoping I could do something like this:

collection.aggregate([
    {$match: bTreeCursorMatch}, // Uses index 5,000,000 -> 500 fast
    {$match: basicCursorMatch}, // No index, 500 -> 100 'slow'
    {$sort}
]);

.. but it seems that I cannot do this either. Is there an alternative to do what I want?

The reason I am asking is because this second query will differ a lot and there is no way I can index all the fields. But I do want to make that first query using a bTreeCursor, otherwise querying the whole collection will take forever using a basicCursor.

update

Also, through user input the subselection of 500 documents will be queried in different ways during a session with an unpredictable basicCursor query, using multiple $in $eq $gt $lt. But during this, the bTreeCursor subselection remains the same. Should I just keep doing both queries for every user query, or is there a more efficient way to keep a reference to this collection?

Upvotes: 3

Views: 2836

Answers (1)

Pascal Bugnion
Pascal Bugnion

Reputation: 4928

In practice, you rarely need to run second queries on a cursor. You specially don't need to break MongoDB's work into separate indexable / non-indexable chunks.

If you pass a query to MongoDB's find method that can be partially fulfilled by a look-up in an index, MongoDB will do that look-up first, and then do a full scan on the remaining documents.

For instance, I have a collection users with documents like:

{ _id : 4, gender : "M", ... }

There is an index on _id, but not on gender. There are ~200M documents in users.

To get an idea of what MongoDB is doing under the hood, add explain() to your cursor (in the Mongo shell):

> db.users.find( { _id : { $gte : 1, $lt : 10 } } ).explain()
{
    "cursor" : "BtreeCursor oldId_1_state_1",
    "n" : 9,
    "nscannedObjects" : 9
}

I have cut out some of the fields returned by explain. Basically, cursor tells you if it's using an index, n tells you the number of documents returned by the query and nscannedObjects is the number of objects scanned during the query. In this case, mongodb was able to scan exactly the right number of objects.

What happens if we now query on gender as well?

> db.users.find( { _id : { $gte : 1, $lt : 10 }, gender : "F" } ).explain()
{
    "cursor" : "BtreeCursor oldId_1_state_1",
    "n" : 5,
    "nscannedObjects" : 9
}

find returns 5 objects, but had to scan 9 documents. It was therefore able to isolate the correct 9 documents using the _id field. It then went through all 9 documents and filtered them by gender.

Upvotes: 1

Related Questions