Reputation: 21
I'm tring to optimize for high scalability my mongodb aggregation query, but there's something I don't understand.
I am running the software on nodejs 10.x GoogleAppEngine and mongodb 3.6 on Atlas servers M30.
I have a big collection of apartments (100k but will be more in the near future), every apartment has its own informations about size, location, prices and much more of that all stored in a single nested document (object of array of object of array)
Because I have to do a search in this collection to find specifics apartments and sort them by calculated price, I have builded a very big aggregation pipeline: almost 15 steps with varius $lookup, $map, $reduce for more than 500 lines of query.
This is just the flow of it:
$aggregation = [
{ $match: { $and: [
{ $geoWithin : '' },
{ field1: '' },
{ field2: '' },
{ fieldN: '' }
]}}, // Initial match
{$project: {}}, // pass just a subset of the fields
{$addFields: {
$map: {} // first step of processing data
}},
{$addFields: {
$map: { // post processing previous data
$let: { $map: { $let: {} } }
}},
},
{$lookup: {}}, // 1th external collection to join
{$lookup: {}}, // 2th external collection to join
{$addFields: {}}, // appends 5 more calculated fields
{$match : { $and:[
{field1: ''},
{field2: ''},
{field3: ''},
] }}, // final match on processed data
{$sort: {}},
{$group: {}},
{$project: {}} // for limiting and paginating results
]
All seems to works fine, the query respond in around 300ms and the user experience is very good BUT, yesterday I started to do some performance tests, like sending an high amount of request in parallel with a tool and comes out the problem! After first 10 requests, the db starts to enqueue the queries increasing their response time more and more until it exceeds 100 seconds for a single response and then stop responding!
Thank you
Upvotes: 1
Views: 375
Reputation: 21
It seems I have found where the problem is: the data structure itself!
This is an extract of the whole apartment object:
{
_id:
info: {},
pricing: {},
conditioons: {},
...
seasons: [{
rules:{},
pricing:{},
conditions:{},
...
...,
}
{},
{},
{},
...
]
}
The problem is the seasons array!! It's increasing size could compromise the mongodb internal javascript engine.
The solution was move all of the seasons array from apartments object into a single collection, with a proper index, and a field to reference to their apartment. In the main aggregation query, use a $lookup stage with the seasons collection, filtered to attach only the seasons needed that will be 2 or 3 instead of 300!.
This change was seems weird to me, but it speed up the query execution up to 10-15x!
Hope it helps!
Upvotes: 1