Reputation: 97
I am running a 3 node Mongo cluster
(version 3.0 wired tiger storage engine ) with 10GB RAM
.
I have around 2 million
doc each having 25 - 30 fields
of which 2 are elementary arrays.
I am performing aggregation
query which takes around 150 -170 milliseconds
.
When I generate a load of 100 queries/sec
, the performance starts degrading and reaches up to 2 sec.
db.testCollection.aggregate( [
{ $match: { vid: { $in: ["001","002"]} , ss :"N" , spt : { $gte : new Date("2016-06-29")}, spf :{ $lte : new Date("2016-06-27")}}},
{ $match: {$or : [{sc:{$elemMatch :{$eq : "TEST"}}},{sc :{$exists : false}}]}},
{ $match: {$or : [{pt:{$ne : "RATE"}},{rpis :{$exists : true}}]}},
{ $project: { vid: 1, pid: 1, pn: 1, pt: 1, spf: 1, spt: 1, bw: 1, bwe: 1, st: 1, et: 1, ls: 1, dw: 1, at: 1, dt: 1, d1: 1, d2: 1, mldv: 1, aog: 1, nn: 1, mn: 1, rpis: 1, lmp: 1, cid: 1, van: 1, vad: 1, efo: 1, sc: 1, ss: 1, m: 1, pr: 1, obw: 1, osc: 1, m2: 1, crp: 1, sce: 1, dce: 1, cns: 1 }},
{ $group: { _id: null , data: { $push: "$$ROOT" } }
},
{ $project: { _id: 1 , data : 1 } }
]
)
There is a compound index on all the fields, in the same order as used for for query (except "rpis"
since compound index can have only one array field).
Please suggest, where I am going wrong.
Upvotes: 0
Views: 90
Reputation: 9473
the two last stages are unnecessary.
last group is a very heavy as it creates new array in memory, but your result should be digested by application at this stage (not using group).
and there could be a green light to remove previous $project
as maybe it could be cheaper to push full document down to client - this could be worth a try.
When $match
is used on first entry - then index is used, there is a huge risk that 2nd and 3rd match works with result set from first pipeline instead of using created indexes. If you have a way try to compress $match
stages to have only one and see how query performs.
Simplified version of query below:
db.testCollection.aggregate([{
$match : {
vid : {
$in : ["001", "002"]
},
ss : "N",
spt : {
$gte : new Date("2016-06-29")
},
spf : {
$lte : new Date("2016-06-27")
}
}
}, {
$match : {
$or : [{
sc : {
$elemMatch : {
$eq : "TEST"
}
}
}, {
sc : {
$exists : false
}
}
]
}
}, {
$match : {
$or : [{
pt : {
$ne : "RATE"
}
}, {
rpis : {
$exists : true
}
}
]
}
}])
Other issue could be business rules which had impact for scaling system to sharded environment - do you have estimate of load before you started working with such document structure?
Upvotes: 1