Aggregate in MongoDB return more fields

Question

I'm currently experimenting with MongoDB. Using the Twitters Streaming API I collected a bunch of tweets (seemed a good way to learn to use MongoDB's aggregation options).

I have the following query

db.twitter.aggregate([
    { $group : { _id : '$status.user.screen_name', count: { $sum : 1 } } },
    { $sort : { count : -1, _id : 1 } },
    { $skip : 0 },
    { $limit : 5 },
]);

As expected this is te result:

{
    "result" : [ 
        {
            "_id" : "VacaturesBreda",
            "count" : 5
        }, 
        {
            "_id" : "breda_nws",
            "count" : 3
        }, 
        {
            "_id" : "BredaDichtbij",
            "count" : 2
        }, 
        {
            "_id" : "JobbirdUTITBaan",
            "count" : 2
        }, 
        {
            "_id" : "vacatures_nr1",
            "count" : 2
        }
    ],
    "ok" : 1
}

The question is how can I match on the user id_str and return the screen_name and for example the followers_count of the user. I tried to do this with { $project .... } but I kept ending up with an empty result set.

For those not familiar with the user object in Twitters JSON response here a part of it (just selected the first user in the db).

"user" : {
        "id" : 2678963916,
        "id_str" : "2678963916",
        "name" : "JobbirdUT IT Banen",
        "screen_name" : "JobbirdUTITBaan",
        "location" : "Utrecht",
        "url" : "http://www.jobbird.com",
        "description" : "Blijf op de hoogte van de nieuwste IT/Automatisering vacatures in Utrecht, via http://Jobbird.com",
        "protected" : false,
        "verified" : false,
        "followers_count" : 1,
        "friends_count" : 1,
        "listed_count" : 0,
        "favourites_count" : 0,
        "statuses_count" : 311,
        "created_at" : "Fri Jul 25 07:35:48 +0000 2014",
        ...
    },

Update: As requested a clear example on the proposed response (sorry for not adding it).

So instead of grouping on the screen_name grouping on the id_str. Why you might ask, it is possible to edit your screen_name but you are still the same user for Twitter (so the last screen_name should be returned:

db.twitter.aggregate([
    { $group : { _id : '$status.user.id_str', count: { $sum : 1 } } },
    { $sort : { count : -1, _id : 1 } },
    { $skip : 0 },
    { $limit : 5 },
]);

And as the response something like this:

{
    "result" : [ 
        {
            "_id" : "123456789",
            "screen_name": "awsome_screen_name",
            "followers_count": 523,
            "count" : 5
        }, 
        ....
    ],
    "ok" : 1
}

Neil Lunn · Accepted Answer

You are basically looking for an operator that does not specifically "aggregate" the content, and this is basically what the $first and $last operators do:

db.twitter.aggregate([
    { "$group": {
        "_id": "$status.user.id_str",
        "screen_name": { "$first": "$status.user.screen_name" },
        "followers_count": { "$sum": "$status.user.followers_count" },
        "count": { "$sum": 1 }
    }},
    { "$sort": { "followers_count": -1, "count": -1 } },
    { "$limit": 5 }
])

Which picks the "first" occurrence of the field based on the grouping key. That is generally useful where there is duplicated related data in the documents to the grouping key.

An alternate approach is to include the fields in the grouping key. You can later restructure with $project:

db.twitter.aggregate([
    { "$group": {
        "_id": { 
            "_id": "$status.user.id_str",
             "screen_name": "$status.user.screen_name"
        },
        "followers_count": { "$sum": "$status.user.followers_count" },
        "count": { "$sum": 1 }
    }},
    { "$project": {
        "_id": "$_id._id",
        "screen_name": "$_id.screen_name"
        "followers_count": 1,
        "count": 1
    }},
    { "$sort": { "followers_count": -1, "count": -1 } },
    { "$limit": 5 }
])

Which is useful where you are unsure of the related "uniqueness".

Aggregate in MongoDB return more fields

Answers (1)

Related Questions