cmarrero01
cmarrero01

Reputation: 711

Improve performance of Aggregate query on MongoDB

I have a big problem with aggregate framework of mongodb. I will try to explain the situation the best that I can, but the code is very clear.

We have a collection categories and notes, each note has a category id field and createdAt field, and we want to get the last note by each category.

So, we make the next agreggate function:

(async () =>{
        //busco las categorias
        const categoryCollection = db.collection('category');
        const categoryList  = await categoryCollection.find({},{ projection:{name: 1} }).toArray();

        //genero un arrays de ids de categoria
        const categoryIds = categoryList.map(function(e) { 
            return e._id;
        });

        //busco una nota por categoria ordenadas por fecha de creacion de forma descendente
        const articlesColl = db.collection('note');
        const articles = await articlesColl.aggregate([
            { $match: { "category":{$in: categoryIds }}},
            { $sort: { createdAt: -1 }},
            {
                $group: {
                    _id: "$category",
                    note: { $first: "$$ROOT"}
                }
            },
            { $replaceRoot: { newRoot: "$note" } },
            { $project : { _id : 1 ,title : 1, image : 1, category:1} },
            { $skip: skip},
            { $limit : limit }
        ],{allowDiskUse: true}).toArray();

        callback(null, success(
            //reemplazo los category id con el nombre
            articles.map(
                function(doc){
                    doc.categoryName = categoryList.find( e => e._id.equals(doc.category)).name;
                    return doc;
                }
            )
        ));

This query, gets the last note by each category but the performance of this "sucks".

I use mongo atlas and the alerts show me this:

QUERY INEFFICIENCY SCORE: 258393, EXECUTION COUNT 4, AVERAGE EXECUTION TIME 2872 MS

And this is an example of the query.

0: Object $match: Object category: Object $in: Array[23] 0: 5a4536cd920f3a5acdf33a60 1: 5a4536cd920f3a5acdf33a55 2: 5a4536cd920f3a5acdf33a53 3: 5a4536cd920f3a5acdf33a66 4: 5a4536cd920f3a5acdf33a5a 5: 5a4536cd920f3a5acdf33a56 6: 5a4536cd920f3a5acdf33a51 7: 5a4536cd920f3a5acdf33a58 8: 5a4536cd920f3a5acdf33a5b 9: 5a4536cd920f3a5acdf33a57 10: 5a4536cd920f3a5acdf33a63 11: 5a4536cd920f3a5acdf33a5d 12: 5a4536cd920f3a5acdf33a5c 13: 5a4536cd920f3a5acdf33a59 14: 5a4536cd920f3a5acdf33a52 15: 5a4536cd920f3a5acdf33a5e 16: 5a4536cd920f3a5acdf33a65 17: 5a4536cd920f3a5acdf33a61 18: 5b202ef5d03337b3a0227daf 19: 5a4536cd920f3a5acdf33a64 20: 5a4536cd920f3a5acdf33a62 21: 5a4536cd920f3a5acdf33a5f 22: 5a4536cd920f3a5acdf33a54 1: Object $sort: Object createdAt: -1 2: Object $group: Object _id: $category note: Object $first: $$ROOT 3: Object $replaceRoot: Object newRoot: $note 4: Object $project: Object _id: 1 title: 1 image: 1 category: 1 5: Object $skip: 0 6: Object $limit: 8 Fri Aug 17 2018 10:11am 6283 ms 1033573 / 8nScanned / nReturned

The big problem here is this query is really slow, some times spend more than 6 seconds to finish.

Some ideas to improve this?

Upvotes: 0

Views: 550

Answers (2)

Joe
Joe

Reputation: 28326

That query inefficiency score means that for every document you return the query examines 258393 documents.

Have you considered iterating the categoryList and using find to get just the most recent note?

const articles = categoryList.map(function(e) { 
            return await articlesColl.Find(x => x.category == e._id).Sort({ createdAt: -1 }).Limit(1)
        });

If you created an index on {category:1, createdAt:-1}, running this would require examining only a single document for each category (total of 23 in your example). Even with the additional network round trips, reducing the number of documents examined by a factor of 100,000 should allow them all to complete in significantly less that 6 seconds.

Upvotes: 0

Mạnh Quyết Nguyễn
Mạnh Quyết Nguyễn

Reputation: 18235

Consider your size of collections, use $sort in memory cost you too much processing time.

You should create index for your createAt field.

Upvotes: 0

Related Questions