archit
archit

Reputation: 69

Mongo range query slow on objects inside array

I have a collection orders

{
    "_id": "abcd",
    "last_modified": ISODate("2016-01-01T00:00:00Z"),
    "suborders": [
        {
            "suborder_id": "1",
            "last_modified: ISODate("2016-01-02T00: 00: 00Z")
        },  {
            "suborder_id":"2",
            "last_modified: ISODate("2016-01-03T00:00:00Z")
        }
    ]
}

I have two indexes on this collection: {"last_modified":1} {"suborders.last_modified": 1}

when I use range queries on last_modified, indexes are properly used, and results are returned instantly. eg query: db.orders.find({"last_modified":{$gt:ISODate("2016-09-15"), $lt:ISODate("2016-09-16")}});

However, when I am querying on suborders.last_modified, the query takes too long to execute. eq query:db.orders.find({"suborders.last_modified":{$gt:ISODate("2016-09-15"), $lt:ISODate("2016-09-16")}});

Please help debug this.

Upvotes: 0

Views: 1192

Answers (1)

logan rakai
logan rakai

Reputation: 2557

The short answer is to use min and max to set the index bounds correctly. For how to approach debugging, read on.

A good place to start for query performance issues is to attach .explain() at the end of your queries. I made a script to generate documents like yours and execute the queries you provided.

I used mongo 3.2.9 and both queries do use the created indices with this setup. However, the second query was returning many more documents (approximately 6% of all the documents in the collection). I suspect that is not your intention.

To see what is happening lets consider a small example in the mongo shell:

> db.arrayFun.insert({
    orders: [
        { last_modified: ISODate("2015-01-01T00:00:00Z") },
        { last_modified: ISODate("2016-01-01T00:00:00Z") }
    ]
})
WriteResult({ "nInserted" : 1 })

then query between May and July of 2015:

> db.arrayFun.find({"orders.last_modified": {
    $gt: ISODate("2015-05-01T00:00:00Z"),  
    $lt: ISODate("2015-07-01T00:00:00Z")
}}, {_id: 0})
{ "orders" : [ { "last_modified" : ISODate("2015-01-01T00:00:00Z") }, { "last_modified" : ISODate("2016-01-01T00:00:00Z") } ] }

Although neither object in the array has last_modified between May and July, it found the document. This is because it is looking for one object in the array with last_modified greater than May and one object with last_modified less than July. These queries cannot intersect multikey index bounds, which happens in your case. You can see this in the indexBounds field of explain("allPlansExecution") output, in particular one of the lower bound or upper bound Date will not be what you specified. This means that a large number of documents may need to be scanned to complete the query depending on your data.

To find objects in the array that have last_modified between two bounds, I tried using $elemMatch.

db.orders.find({"suborders": {
    $elemMatch:{ 
        last_modified:{
            "$gt":ISODate("2016-09-15T00:00:00Z"), 
            "$lt":ISODate("2016-09-16T00:00:00Z")
         }
     }
}})

In my test this returned about 0.5% of all documents. However, it was still running slow. The explain output showed it was still not setting the index bounds correctly (only using one bound).

What ended up working best was to manually set the index bounds with min and max.

db.subDocs.find()
    .min({"suborders.last_modified":ISODate("2016-09-15T00:00:00Z")})
    .max({"suborders.last_modified":ISODate("2016-09-16T00:00:00Z")})

Which returned the same documents as $elemMatch but used both bounds on the index. It ran in 0.021s versus 2-4s for elemMatch and the original find.

Upvotes: 1

Related Questions