nkare
nkare

Reputation: 397

mongodb groupby slow even after adding index

I have a simple collection :

{
    "_id" : ObjectId("5033cc15f31e20b76ca842c8"),
    "_class" : "com.pandu.model.alarm.Alarm",
    "serverName" : "CDCAWR009 Integration Service",
    "serverAddress" : "cdcawr009.na.convergys.com",
    "triggered" : ISODate("2012-01-28T05:09:03Z"),
    "componentName" : "IntegrationService",
    "summary" : "A device which is configured to be recorded is not being recorded.",
    "details" : "Extension<153; 40049> on CDCAWR009 is currently not being recorded
    properly; recording requested for the following reasons: ",
    "priority" : "Major"
}

there will be around couple of millions of such documents in the collection. I am trying to group by the server name and get a count of all server name. Sounds simple from RDBMS query point of view.

The query that I have come up with is 
    db.alarm.group( {key: { serverName:true }, reduce: function(obj,prev) { prev.count++ }, initial: { count: 0 }});

Also, I have added an index on serverName.

> db.alarm.getIndexes()
[
        {
                "v" : 1,
                "key" : {
                        "_id" : 1
                },
                "ns" : "test.alarm",
                "name" : "_id_"
        },
        {
                "v" : 1,
                "key" : {
                        "serverName" : 1
                },
                "ns" : "test.alarm",
                "name" : "serverName_1"
        }
]

However, i am getting a response in mongodb after 13 seconds. whereas in sql server, similar query returns back within 4 seconds that too without an index.

Is there anything I am missing?

Thanks in anticipation.

Upvotes: 3

Views: 920

Answers (2)

Jenna
Jenna

Reputation: 2396

Another option, and perhaps the most performant solution for the time being, may be to use the distinct() command and count the results client-side. http://www.mongodb.org/display/DOCS/Aggregation#Aggregation-Distinct

Upvotes: 2

Adam Comerford
Adam Comerford

Reputation: 21692

As you can see from the query that you wrote, this type of aggregation in 2.0 requires you to run Map/Reduce. Map/Reduce on MongoDB has some performance penalties which have been covered on SO before - basically unless you are able to parallelize across a cluster you are going to be running single threaded javascript via Spidermonkey - not a speedy proposition. The index, since you are not being selective, does not really help - you just have to scan the whole index as well as potentially the document.

With the imminent release of 2.2 (currently in rc1 as of writing this) you have some options though. The aggregation framework (which is native, not JS based Map/Reduce) introduced in 2.2 has a built in group operator and was created specifically to speed up this kind of operation in MongoDB.

I would recommend giving 2.2 a shot and see if your performance on grouping improves. I think it would look something like this (note: not tested):

db.alarm.aggregate(
    { $group : {
        _id : "$serverName",
        count : { $sum : 1 }
    }}
);

Upvotes: 4

Related Questions