Mirodil
Mirodil

Reputation: 2329

Increase Aggregation Framework Performance in MongoDb

I have 200k records in my collection. My data model looks like as follows:

{
    "_id" : ObjectId("51750ec159dcef125863b7c4"),
    "DateAdded" : ISODate("2013-04-22T00:00:00.000Z"),
    "DateRemoved" : ISODate("2013-12-22T00:00:00.000Z"),
    "DealerID" : ObjectId("51750bd559dcef07ec964a41"),
    "ExStockID" : "8324482",
    "Make" : "Mazda",
    "Model" : "3",
    "Price" : 11479,
    "Year" : 2012,
    "Variant" : "1.6d (115) TS 5dr",
    "Turnover": 150
} 

I have several indexes for the collection, one of those used for aggregation framework is:

{
    "DealerID" : 1,
    "DateRemoved" : -1,
    "Price" : 1,
    "Turnover" : 1
}

The aggregate query which is being used:

db.stats.aggregate([
{
    "$match": {
        "DealerID": {
            "$in": [
                ObjectId("523325ac59dcef1b90a3d446"),
                ....
                // here is specified more than 150 ObjectIds
            ]
        },
        "DateRemoved": {
            "$gte": ISODate("2013-12-01T00:00:00Z"),
            "$lt": ISODate("2014-01-01T00:00:00Z")
        }
    }
},
{ "$project" : { "Price":1, "Turnover":1 } },
{
    "$group": {
        "_id": null,
        "Price": {
            "$avg": "$Price"
        },
        "Turnover": {
            "$avg": "$Turnover"
        }
    }
}]);

and the time for this query executions resides between 30-200 seconds.

How can I optimize this?

Upvotes: 0

Views: 419

Answers (2)

Mirodil
Mirodil

Reputation: 2329

Using @Derick's answer I have found the index which prevented to create the covered index. As far as I can see query optimizer uses the first index which covers just the query itself, so I have changed the order of indexes. So here is outcome before and after.

Before:

{
    "serverPipeline" : [ 
        {
            "query" : {...},
            "projection" : { "Price" : 1, "Turnover" : 1, "_id" : 0 },
            "cursor" : {
                "cursor" : "BtreeCursor DealerIDDateRemoved multi",
                "isMultiKey" : false,
                "n" : 11036,
                "nscannedObjects" : 11008,
                "nscanned" : 11307,
                "nscannedObjectsAllPlans" : 11201,
                "nscannedAllPlans" : 11713,
                "scanAndOrder" : false,
                "indexOnly" : false,
                "nYields" : 0,
                "nChunkSkips" : 0,
                "millis" : 58,
                "indexBounds" : {...},
                "allPlans" : [...],
                "oldPlan" : {...},
                "server" : "..."
            }
        }, 
        {
            "$group" : {...}
        }
    ],
    "ok" : 1
}

After these changes indexOnly param now shows true, this means we have just created the covered index:

{
    "serverPipeline" : [ 
        {
            "query" : {...},
            "projection" : { "Price" : 1, "Turnover" : 1, "_id" : 0 },
            "cursor" : {
                "cursor" : "BtreeCursor DealerIDDateRemovedPriceTurnover multi",
                "isMultiKey" : false,
                "n" : 11036,
                "nscannedObjects" : 0,
                "nscanned" : 11307,
                "nscannedObjectsAllPlans" : 285,
                "nscannedAllPlans" : 11713,
                "scanAndOrder" : false,
                "indexOnly" : true,
                "nYields" : 0,
                "nChunkSkips" : 0,
                "millis" : 58,
                "indexBounds" : {...},
                "allPlans" : [...],
                "server" : "..."
            }
        }, 
        {
            "$group" : {...}
    ],
    "ok" : 1
}

Now the query works approximately between 0.085-0.300 seconds. Additional information about covered queries Create Indexes that Support Covered Queries

Upvotes: 1

Derick
Derick

Reputation: 36794

You can try to run explain on the aggregation pipeline, but as I don't have your full dataset, I can't try it out properly:

p = [
{
    "$match": {
        "DealerID": {
            "$in": [
                ObjectId("51750bd559dcef07ec964a41"),
                ObjectId("51750bd559dcef07ec964a44"),
            ]
        },
        "DateRemoved": {
            "$gte": ISODate("2013-12-01T00:00:00Z"),
            "$lt": ISODate("2014-01-01T00:00:00Z")
        }
    }
},
{ "$project" : { "Price":1, "Turnover":1 } },
{
    "$group": {
        "_id": null,
        "Price": {
            "$avg": "$Price"
        },
        "Turnover": {
            "$avg": "$Turnover"
        }
    }
}];

db.s.runCommand('aggregate', { pipeline: p, explain: true } );

I would suggest to remove the fields that are not part of the $match (Price and Turnover). Also, I think you should switch the order of DealerId and DateRemoved as you want to do one range search, and from that range then include all the dealers. Doing it the other way around means that you can really only use the index for the 150 single items, and then you need to do a range search.

Upvotes: 2

Related Questions