group in aggregate framework stopped working properly

Question

I hate this kind of questions but maybe you can point me to obvious. I'm using Mongo 2.2.2.

I have a collection (in replica set) with 6M documents which has string field called username on which I have index. The index was non-unique but recently I made it unique. Suddenly following query gives me false alarms that I have duplicates.

db.users.aggregate(
    { $group : {_id : "$username", total : { $sum : 1 } } },
    { $match : { total : { $gte : 2 } } },
    { $sort : {total : -1} } );

which returns

{
        "result" : [
                {
                        "_id" : "davidbeges",
                        "total" : 2
                },
                {
                        "_id" : "jesusantonio",
                        "total" : 2
                },
                {
                        "_id" : "elesitasweet",
                        "total" : 2
                },
                {
                        "_id" : "theschoolofbmx",
                        "total" : 2
                },
                {
                        "_id" : "longflight",
                        "total" : 2
                },
                {
                        "_id" : "thenotoriouscma",
                        "total" : 2
                }
        ],
        "ok" : 1
}

I tested this query on sample collection with few documents and it works as expected.

expert · Accepted Answer

One of 10gen responded in their JIRA.

Are there any updates on this collection? If so, I'd try adding {$sort: {username:1}} to the front of the pipeline. That will ensure that you only see each username once if it is unique. If there are updates going on, it is possible that aggregation would see a document twice if it moves due to growth. Another possibility is that a document was deleted after being seen by the aggregation and a new one was inserted with the same username.

So sorting by username before grouping helped.

group in aggregate framework stopped working properly

Answers (2)

Related Questions