How do I index the following query in mongodb?

Question

I am trying to figure out the best index to use for this in mongodb:

db.articles.find({"images.url":{"$exists":true}, \  
    "source_id": {"$in":[ObjectId("511baf3aa56bde8e94000002"), ObjectId("511baf3aa56bde8e94000999")]}})  \
    .sort({"published_at": -1})

I only want to include articles where the images.url exists, so I'm wondering if it would be a sparse index? And not sure which fields to index in order, as i've read different pointers of:

First, fields on which you will query for exact values. ("images.url": exists)
Second, fields on which you will sort. (:published_at)
Finally, fields on which you will query for a range of values. (source_id)

Also, in the example above, I am not sure whether source_id would be a range of values or not?

I was thinking:

index "images.url": -1, published_at: -1, source_id: 1, {sparse: true}

But I'm also torn on maximing exclusivity for an index, so I am considering:

index source_id: 1, "images.url": -1, published_at: -1, {sparse: true}

ronasta · Accepted Answer

If we have a collection like this

{ a:1, b:1, c:1 }
{ a:1, b:1, c:2 }
{ a:1, b:1, c:3 }
{ a:1, b:2, c:1 }
... // all permutations up to:
{ a:3, b:3, c:3 }

imagine this collection in random order

this is how the compound index on ({a:1,b:1,c:1}) would look like

a:  |        1        |        2        |        3        |
    |-----------------+-----------------+-----------------|                   
b:  |  1  |  2  |  3  |  1  |  2  |  3  |  1  |  2  |  3  |
    |-----+-----+-----+-----+-----+-----+-----+-----+-----|
c:  |1|2|3|1|2|3|1|2|3|1|2|3|1|2|3|1|2|3|1|2|3|1|2|3|1|2|3|

for each a you have all its b with all its c in turn, okay?

For the query: db.xx.find({a:2}).sort({b:1}), you can see that the b elements are in order below the a=2; the index will be used for sorting - "scanAndOrder" : false in explain(). The same happens, if your query is db.xx.find({a:2,c:{$in:[1,3]}}).sort({b:1})

But this: db.xx.find({a:{$in:[1,3]}}).sort({b:1}).explain() will tell you "scanAndOrder" : true, which means that the index was not used for sorting (it was used for the query, though) - from the schema above you can see, that "b" is not in sequence for a=[1,3].

That's why the efficient sequence for indexes is:

(1) exact matches (only one!)
(2) sort criteria
(3) matches that point to more than one document

In your case, there is no exact match; both queries return more than one document. Let's try this out in our example:

db.xx.find({a:{$in:[1,3]},b:{$in:[1,3]}}).sort({c:1}).explain(): uses the index for querying, but not for sorting, it scans 15 and returns 12 objects.

db.xx.find({b:{$in:[1,3]},c:{$in:[1,3]}}).sort({a:1}).explain(): uses the index for querying and for sorting, but scans 21 and returns 12 objects.

Which one is better? It will depend on your use case. If your find usually returns many documents, it could be more efficient to have the sort use the index - but if it normally returns only a few (out of many) then you might prefer the more efficient scan. Try it out and see what's better using explain()

Does this help?

regards

Ronald

P.S. I used this to create the example collection:

[1,2,3].forEach(function(a){
    [1,2,3].forEach(function(b){
        [1,2,3].forEach(function(c){
            db.xx.insert({a:a,b:b,c:c});
        })
    })
})

How do I index the following query in mongodb?

Answers (1)

Related Questions