Reputation: 1881
I have a rather large collection in mongodb with approx 100 000 documents (not sharded). This is a backend for a web application that basically just allow the user to browse different ways to view the same information in this collection.
For one of the views I try to count the number of occurrences of a field using the aggregation framework. This means aggregating the entire collection. The problem is that this aggregation operation (which is a simple pipeline of group, sort and limit) takes 2 seconds which is too slow for a web application.
So my question is; what is the preferred solution in order to cache the result of this aggregation operation? As far as I have found, it is not possible to "aggregate into" a new collection, or anything like that. At the moment the only solution I have found is to read the entire result into an variable and then inserting this variable into a new collection using insert - but I fear this involves sending a lot of data from the database => to my application => back to the database?
Any suggestions?
Example of pipeline:
res = items.aggregate([
{ "$group": { "_id": { "item_id": "$item_id", "title": "$title", "category": "$category" }, "count": { "$sum": 1 } } },
{ "$sort": { "count": -1 } },
{ "$limit": 5 }
])
The schema is basically those 3 fields + a few more that is really not relevant, i.e:
doc = {
"item_id": 1000,
"title": "this is the item title",
"category": "this is the item category"
}
I have tried index on both item_id and all 3 fields with no success.
Upvotes: 3
Views: 2564
Reputation: 668
Aggregation returns result to one document. The result is limited by 16M. The document is returned to app.
If you want to "aggregate" to a collection - use map-reduce.
map_function = function () {
emit(this.item_id, {"item_id": this.item_id, /* any other info */ "count": 1});
};
reduce_function = function (key, values) {
var result = {"item_id": key, /* any other info should be given from one or any of values array objects */ "count": 0};
values.forEach(function (value) {
result["count"] += value["count"];
});
return result;
};
Not sure whether you can emit structural values - please try. BTW emitting key field is good.
Upvotes: 1