rahul anand
rahul anand

Reputation: 97

Aggregation Performance Degradation on increasing load

I am running a 3 node Mongo cluster (version 3.0 wired tiger storage engine ) with 10GB RAM.

I have around 2 million doc each having 25 - 30 fields of which 2 are elementary arrays.

I am performing aggregation query which takes around 150 -170 milliseconds.

When I generate a load of 100 queries/sec, the performance starts degrading and reaches up to 2 sec.

Query

db.testCollection.aggregate(  [
                 { $match: { vid: { $in: ["001","002"]} , ss :"N"   , spt : { $gte : new Date("2016-06-29")}, spf :{ $lte : new Date("2016-06-27")}}},
                 { $match: {$or : [{sc:{$elemMatch :{$eq : "TEST"}}},{sc :{$exists : false}}]}},
                 { $match: {$or : [{pt:{$ne  : "RATE"}},{rpis :{$exists : true}}]}},
                 { $project: { vid: 1, pid: 1, pn: 1, pt: 1, spf: 1, spt: 1, bw: 1, bwe: 1, st: 1, et: 1, ls: 1, dw: 1, at: 1, dt: 1, d1: 1, d2: 1, mldv: 1, aog: 1, nn: 1, mn: 1, rpis: 1, lmp: 1, cid: 1, van: 1, vad: 1, efo: 1, sc: 1, ss: 1, m: 1, pr: 1, obw: 1, osc: 1, m2: 1, crp: 1, sce: 1,  dce: 1, cns: 1 }},                      
                  { $group: { _id: null , data: { $push: "$$ROOT" } }       
           },
                 { $project: { _id: 1 , data : 1 } }
               ] 
                   )

There is a compound index on all the fields, in the same order as used for for query (except "rpis" since compound index can have only one array field).

Please suggest, where I am going wrong.

Upvotes: 0

Views: 90

Answers (1)

profesor79
profesor79

Reputation: 9473

the two last stages are unnecessary. last group is a very heavy as it creates new array in memory, but your result should be digested by application at this stage (not using group). and there could be a green light to remove previous $project as maybe it could be cheaper to push full document down to client - this could be worth a try.

When $match is used on first entry - then index is used, there is a huge risk that 2nd and 3rd match works with result set from first pipeline instead of using created indexes. If you have a way try to compress $match stages to have only one and see how query performs.

Simplified version of query below:

db.testCollection.aggregate([{
            $match : {
                vid : {
                    $in : ["001", "002"]
                },
                ss : "N",
                spt : {
                    $gte : new Date("2016-06-29")
                },
                spf : {
                    $lte : new Date("2016-06-27")
                }
            }
        }, {
            $match : {
                $or : [{
                        sc : {
                            $elemMatch : {
                                $eq : "TEST"
                            }
                        }
                    }, {
                        sc : {
                            $exists : false
                        }
                    }
                ]
            }
        }, {
            $match : {
                $or : [{
                        pt : {
                            $ne : "RATE"
                        }
                    }, {
                        rpis : {
                            $exists : true
                        }
                    }
                ]
            }
        }])

Other issue could be business rules which had impact for scaling system to sharded environment - do you have estimate of load before you started working with such document structure?

Upvotes: 1

Related Questions