ben39
ben39

Reputation: 2577

Mongo aggregation framework, Sort and then group not working

I am trying sort data by date first and then group on another field. It is not working to me.

The question I am trying to answer is: Select the most recent distinct cid?

given this data:

db.summary.save({"lid" : 5, "date" : 5, "cid" : 2, "circles" : [ 2 ] })
db.summary.save({"lid" : 2, "date" : 2, "cid" : 1, "circles" : [ 2 ] })
db.summary.save({"lid" : 4, "date" : 0, "cid" : 3, "circles" : [ 2 ] })
db.summary.save({"lid" : 3, "date" : 3, "cid" : 2, "circles" : [ 2 ] })
db.summary.save({"lid" : 1, "date" : 1, "cid" : 1, "circles" : [ 2 ] })

db.summary.aggregate( {$match :{circles: 2}, $sort: {date: -1}, $group: {_id: '$cid'}} )

I am doing a match first on circles, then a sort on date, then a group on cid

The result I am getting:

{
    "result" : [
        {
            "_id" : 3
        },
        {
            "_id" : 1
        },
        {
            "_id" : 2
        }
    ],
    "ok" : 1
}

Here is my analysis:

before matching or sorting by date, the data was:

"lid" : 5, "date" : 5, "cid" : 2
"lid" : 2, "date" : 2, "cid" : 1
"lid" : 4, "date" : 0, "cid" : 3
"lid" : 3, "date" : 3, "cid" : 2
"lid" : 1, "date" : 1, "cid" : 1

After sorting by date, the data set would be:

"lid" : 5, "date" : 5, "cid" : 2
"lid" : 3, "date" : 3, "cid" : 2
"lid" : 2, "date" : 2, "cid" : 1 
"lid" : 1, "date" : 1, "cid" : 1
"lid" : 4, "date" : 0, "cid" : 3

So after grouping, the result I expect is:

{
    "result" : [
        {
            "_id" : 2
        },
        {
            "_id" : 1
        },
        {
            "_id" : 3
        }
    ],
    "ok" : 1
}

What query does solve my problem?

Why is the current query not working for me?

Upvotes: 24

Views: 13271

Answers (1)

JohnnyHK
JohnnyHK

Reputation: 312129

When you $group after a $sort in the pipeline, the previous sort is lost. You'd have to do something like this instead so that the date you want to sort by is available after the grouping:

db.summary.aggregate(
    {$match: {circles: 2}},
    {$group: {_id: '$cid', date: {$max: '$date'}}},
    {$sort: {date: -1}});

result:

[ { _id: 2, date: 5 }, 
  { _id: 1, date: 2 }, 
  { _id: 3, date: 0 } ]

Add a $project to the end of the pipeline if you want to reshape the output.

Upvotes: 40

Related Questions