agnsaft
agnsaft

Reputation: 1881

Mongodb aggregation result caching

I have a rather large collection in mongodb with approx 100 000 documents (not sharded). This is a backend for a web application that basically just allow the user to browse different ways to view the same information in this collection.

For one of the views I try to count the number of occurrences of a field using the aggregation framework. This means aggregating the entire collection. The problem is that this aggregation operation (which is a simple pipeline of group, sort and limit) takes 2 seconds which is too slow for a web application.

So my question is; what is the preferred solution in order to cache the result of this aggregation operation? As far as I have found, it is not possible to "aggregate into" a new collection, or anything like that. At the moment the only solution I have found is to read the entire result into an variable and then inserting this variable into a new collection using insert - but I fear this involves sending a lot of data from the database => to my application => back to the database?

Any suggestions?

Example of pipeline:

res = items.aggregate([
    { "$group": { "_id": { "item_id": "$item_id", "title": "$title", "category": "$category" }, "count": { "$sum": 1 } } },
    { "$sort": { "count": -1 } },
    { "$limit": 5 }
])

The schema is basically those 3 fields + a few more that is really not relevant, i.e:

doc = {
    "item_id": 1000,
    "title": "this is the item title",
    "category": "this is the item category"
}

I have tried index on both item_id and all 3 fields with no success.

Upvotes: 3

Views: 2564

Answers (1)

Vitaly Greck
Vitaly Greck

Reputation: 668

Aggregation returns result to one document. The result is limited by 16M. The document is returned to app.

If you want to "aggregate" to a collection - use map-reduce.

map_function = function () {
        emit(this.item_id, {"item_id": this.item_id, /* any other info */ "count": 1});
};

reduce_function = function (key, values) {
        var result = {"item_id": key, /* any other info should be given from one or any of values array objects */ "count": 0};
        values.forEach(function (value) {
                result["count"] += value["count"];
        });
        return result;
};

Not sure whether you can emit structural values - please try. BTW emitting key field is good.

Upvotes: 1

Related Questions