Mongodb scanAndOrder index unexpected behaviour

Question

I'm not understanding this mongodb behaviour, my collection has the following index (names are simplified for clarity sake) and consists of ~ 50k documents.

{
    "v" : 1,
    "key" : {
        "a" : 1,
        "b" : -1
    },
    "ns" : "db.articles",
    "name" : "a_1_b_-1",
    "background" : false,
    "dropDups" : false
}

The following query

db.articles.find({ a: {"$in": ["foo", "bar"] } }).sort({b: -1}).limit(10).explain()

returns:

{
    "cursor" : "BtreeCursor a_1_b_-1 multi",
    "isMultiKey" : false,
    "n" : 10,
    "nscannedObjects" : 20,
    "nscanned" : 21,
    "nscannedObjectsAllPlans" : 68,
    "nscannedAllPlans" : 105,
    "scanAndOrder" : true,
    "indexOnly" : false,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "millis" : 0,
    "indexBounds" : {
        "a" : [
            [
                "foo",
                "foo"
            ],
            [
                "bar",
                "bar"
            ]
        ],
        "b" : [
            [
                {
                    "$maxElement" : 1
                },
                {
                    "$minElement" : 1
                }
            ]
        ]
    },
    "server" : "localhost:27017"
}

"scanAndOrder" is true which means the order in the index can't be used to sort the return set. This means that the query will choke when given an offset (i.e. skip 10000) and subsequently will return "too much data for sort() with no index. add an index or specify a smaller limit". When the query is altered to only do a single equality check, the index is used as expected:

db.articles.find({ a: "foo" }).sort({b: -1}).limit(10).explain()

The resultset order is now the order the documents have in the index:

"scanAndOrder" : false

So it seems to have to do with how the "$in" operator behaves with index lookups?

Vitaly Muminov · Accepted Answer

Compound indices in MongoDB are stored in a tree-like data structure, e.g., assuming we created index on a and b fields, for each value of a there would be associated list of b values.

BTree cursor's $in operator returns list of references to the a values, therefore sort operator have to merge several lists of b values before sorting and no index can be used after.

Mongodb scanAndOrder index unexpected behaviour

Answers (1)

Related Questions