Paul Gorton
Paul Gorton

Reputation: 712

mongodb aggregation sort

I have a database of students and their contact details. I'm trying to find out the postcode that houses the most students. The documents for the students look something like this...

{studentcode: 'smi0001', firstname: 'bob', surname: 'smith', postcode: 2001}

I thought I could use the aggregation framework to find out the postcode with the most students by doing something like...

db.students.aggregate({$project: { postcode: 1 }, $group: {_id: '$postcode', students: {$sum: 1}}})

this works as expected (returns postcodes as _id and the number of students in each postcode as 'students', but if I add $sort to the pipeline it seems to try sorting by the whole student collection instead of the results of the $group operation.

what I'm trying look like...

db.students.aggregate({$project: { postcode: 1 }, $group: {_id: '$postcode', students: {$sum: 1}}, $sort: {_id: -1}})

but it returns the whole collection and disregards the $project and $group... Am I missing something? I thought I'd just be able to sort by descending number of students and return the first item. Thanks in advance for any help.

Upvotes: 41

Views: 69521

Answers (2)

Thomas
Thomas

Reputation: 2638

I think your syntax is slightly wrong. Each aggregation operation in the pipeline should be its own document.

db.students.aggregate( {$project: ...}, {$group: ...}, {$sort: ...} )

In your case, it should be:

db.students.aggregate(
    {$project: { postcode: 1 }}, 
    {$group: {_id: '$postcode', students: {$sum: 1}}}, 
    {$sort: {students: -1}}
)

I've tested it on a sample collection based on your schema and it works for me, sorting the grouped post codes by number of students, descending.

Upvotes: 5

Eve Freeman
Eve Freeman

Reputation: 33175

You almost had it...

db.test.aggregate(
  {$group: {_id: '$postcode', students: {$sum: 1}}}, 
  {$sort: {_id: -1}}
);

gives (I added some test data matching your sample):

{
  "result" : [
    {
        "_id" : 2003,
        "students" : 3
    },
    {
        "_id" : 2002,
        "students" : 1
    },
    {
        "_id" : 2001,
        "students" : 2
    }
  ],
  "ok" : 1
}

You had an outer {} around everything, which was causing some confusion. The group and sort weren't working as separate operations in the pipeline.

You didn't really need the project for this case.

Update You probably want to sort by "students", like so, to get the biggest zipcodes (by population) first:

db.test.aggregate(
  {$group: {_id: '$postcode', students: {$sum: 1}}}, 
  {$sort: {students: -1}}
);

Upvotes: 81

Related Questions