Reputation: 397
I have a simple collection :
{
"_id" : ObjectId("5033cc15f31e20b76ca842c8"),
"_class" : "com.pandu.model.alarm.Alarm",
"serverName" : "CDCAWR009 Integration Service",
"serverAddress" : "cdcawr009.na.convergys.com",
"triggered" : ISODate("2012-01-28T05:09:03Z"),
"componentName" : "IntegrationService",
"summary" : "A device which is configured to be recorded is not being recorded.",
"details" : "Extension<153; 40049> on CDCAWR009 is currently not being recorded
properly; recording requested for the following reasons: ",
"priority" : "Major"
}
there will be around couple of millions of such documents in the collection. I am trying to group by the server name and get a count of all server name. Sounds simple from RDBMS query point of view.
The query that I have come up with is
db.alarm.group( {key: { serverName:true }, reduce: function(obj,prev) { prev.count++ }, initial: { count: 0 }});
Also, I have added an index on serverName.
> db.alarm.getIndexes()
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"ns" : "test.alarm",
"name" : "_id_"
},
{
"v" : 1,
"key" : {
"serverName" : 1
},
"ns" : "test.alarm",
"name" : "serverName_1"
}
]
However, i am getting a response in mongodb after 13 seconds. whereas in sql server, similar query returns back within 4 seconds that too without an index.
Is there anything I am missing?
Thanks in anticipation.
Upvotes: 3
Views: 920
Reputation: 2396
Another option, and perhaps the most performant solution for the time being, may be to use the distinct() command and count the results client-side. http://www.mongodb.org/display/DOCS/Aggregation#Aggregation-Distinct
Upvotes: 2
Reputation: 21692
As you can see from the query that you wrote, this type of aggregation in 2.0 requires you to run Map/Reduce. Map/Reduce on MongoDB has some performance penalties which have been covered on SO before - basically unless you are able to parallelize across a cluster you are going to be running single threaded javascript via Spidermonkey - not a speedy proposition. The index, since you are not being selective, does not really help - you just have to scan the whole index as well as potentially the document.
With the imminent release of 2.2 (currently in rc1 as of writing this) you have some options though. The aggregation framework (which is native, not JS based Map/Reduce) introduced in 2.2 has a built in group operator and was created specifically to speed up this kind of operation in MongoDB.
I would recommend giving 2.2 a shot and see if your performance on grouping improves. I think it would look something like this (note: not tested):
db.alarm.aggregate(
{ $group : {
_id : "$serverName",
count : { $sum : 1 }
}}
);
Upvotes: 4