Arthur Neves
Arthur Neves

Reputation: 12138

removing "scanAndOrder" : true in my MongoDB query result

So i have a collection in my db with the following shardkey : {cl:"yyyy-mm", user_id:N}

when I perform the follow query

db.collection.find({ cl:"2012-03", user_id:{$in:users}, _id:{"$lt":new ObjectId('4f788b54204cfa4946000044')} }).sort({_id:-1}).limit(5).explain(true)

it gives me this result:

"clusteredType" : "ParallelSort",
"shards" : {
    "set1/hostname.com:27018" : [
        {
            "cursor" : "BtreeCursor cl_1_user_id_1 multi",
            "nscanned" : 21294,
            "nscannedObjects" : 21288,
            "n" : 5,
            "scanAndOrder" : true,
            "millis" : 1258,
            "nYields" : 69,
            "nChunkSkips" : 0,
            "isMultiKey" : false,
            "indexOnly" : false,
            "indexBounds" : { ...

So how can I make the sort to use the index then I dont need to scan all 21288 documents, just to return last 5 ?

Upvotes: 5

Views: 1869

Answers (4)

Eric
Eric

Reputation: 450

Thanks for the shoutout on Dex!

If it's not too late here in 2013, the index I recommend for avoiding scanAndOrder here is { _id: -1, cl: 1, user_id: 1 }.

The reason is because an $lt on _id and an $in on user_id constitute ranges across multiple index "buckets". An index of any other order than the above means those buckets must still be sorted together to satisfy a sort on _id. By putting _id first, all documents visited in the index will be properly-ordered in advance.

Note that this is a slight improvement over Andre's suggestion ({ _id: -1, user_id: 1, cl: 1 }, which should also avoid scanAndOrder) because it allows the straight equivalency check on cl to prune results.

Check out http://blog.mongolab.com/2012/06/cardinal-ins/ for more detail.

Upvotes: 4

SDen
SDen

Reputation: 191

AFAIK, conditional operators $gt, $lt etc. (comparison functions in mongo queries) makes mongo to not use an index at all (for that part of query). Mongo must scan all documents in a collection to execute such query.

So, this part:

find({ cl:"2012-03", user_id:{$in:users}, _id:{"$lt":new ObjectId('4f788b54204cfa4946000044')} })

will not use index, even if it exists. That makes it slow.

UPDATE Conditional operators $gt, $lt etc. in query allows to use index, but not that efficient as without them. Still mongodb has to scan more documents that returns in result.

Upvotes: 1

Andre de Frere
Andre de Frere

Reputation: 2743

Because you are using a $lt I don't know if you are going to be able to remove a scanAndOrder operation from your query. The regular axiom is that you have the sort field as the last member of your index, but this breaks down where there is a range query going on. You can usually resolve this by inverting the order of the index, inserting the sort field as the first member. There is an additional problem in your case, as the collection is sharded and therefore you will likely always choose at least the shard key index over the 'inverted order index'.

Without knowing the distribution of your data, it is hard to recommend a specific course of action. From the brief testing I have done, adding an index to {cl:1,user_id:1,_id:-1} roughly halved the nscanned and nscannedobjects. This would be using the sort field as the last member of the index, but has the pitfall as discussed above. You could also try the inverse of this {_id:1,user_id:1,cl:-1} but you might find the shard key is going to get picked over that index. You could further try forcing the inverse index with a hint, but this did not lead to any performance gains in my testing.

Upvotes: 3

jmcd
jmcd

Reputation: 4300

Try Dex to make sure that the index is working the way you think it should be: https://github.com/mongolab/dex

Upvotes: 2

Related Questions