Reputation: 951
I'm currently experimenting with MongoDB. Using the Twitters Streaming API I collected a bunch of tweets (seemed a good way to learn to use MongoDB's aggregation options).
I have the following query
db.twitter.aggregate([
{ $group : { _id : '$status.user.screen_name', count: { $sum : 1 } } },
{ $sort : { count : -1, _id : 1 } },
{ $skip : 0 },
{ $limit : 5 },
]);
As expected this is te result:
{
"result" : [
{
"_id" : "VacaturesBreda",
"count" : 5
},
{
"_id" : "breda_nws",
"count" : 3
},
{
"_id" : "BredaDichtbij",
"count" : 2
},
{
"_id" : "JobbirdUTITBaan",
"count" : 2
},
{
"_id" : "vacatures_nr1",
"count" : 2
}
],
"ok" : 1
}
The question is how can I match on the user id_str
and return the screen_name
and for example the followers_count
of the user. I tried to do this with { $project .... }
but I kept ending up with an empty result set.
For those not familiar with the user object in Twitters JSON response here a part of it (just selected the first user in the db).
"user" : {
"id" : 2678963916,
"id_str" : "2678963916",
"name" : "JobbirdUT IT Banen",
"screen_name" : "JobbirdUTITBaan",
"location" : "Utrecht",
"url" : "http://www.jobbird.com",
"description" : "Blijf op de hoogte van de nieuwste IT/Automatisering vacatures in Utrecht, via http://Jobbird.com",
"protected" : false,
"verified" : false,
"followers_count" : 1,
"friends_count" : 1,
"listed_count" : 0,
"favourites_count" : 0,
"statuses_count" : 311,
"created_at" : "Fri Jul 25 07:35:48 +0000 2014",
...
},
Update: As requested a clear example on the proposed response (sorry for not adding it).
So instead of grouping on the screen_name
grouping on the id_str
. Why you might ask, it is possible to edit your screen_name
but you are still the same user for Twitter (so the last screen_name
should be returned:
db.twitter.aggregate([
{ $group : { _id : '$status.user.id_str', count: { $sum : 1 } } },
{ $sort : { count : -1, _id : 1 } },
{ $skip : 0 },
{ $limit : 5 },
]);
And as the response something like this:
{
"result" : [
{
"_id" : "123456789",
"screen_name": "awsome_screen_name",
"followers_count": 523,
"count" : 5
},
....
],
"ok" : 1
}
Upvotes: 0
Views: 2705
Reputation: 151072
You are basically looking for an operator that does not specifically "aggregate" the content, and this is basically what the $first
and $last
operators do:
db.twitter.aggregate([
{ "$group": {
"_id": "$status.user.id_str",
"screen_name": { "$first": "$status.user.screen_name" },
"followers_count": { "$sum": "$status.user.followers_count" },
"count": { "$sum": 1 }
}},
{ "$sort": { "followers_count": -1, "count": -1 } },
{ "$limit": 5 }
])
Which picks the "first" occurrence of the field based on the grouping key. That is generally useful where there is duplicated related data in the documents to the grouping key.
An alternate approach is to include the fields in the grouping key. You can later restructure with $project
:
db.twitter.aggregate([
{ "$group": {
"_id": {
"_id": "$status.user.id_str",
"screen_name": "$status.user.screen_name"
},
"followers_count": { "$sum": "$status.user.followers_count" },
"count": { "$sum": 1 }
}},
{ "$project": {
"_id": "$_id._id",
"screen_name": "$_id.screen_name"
"followers_count": 1,
"count": 1
}},
{ "$sort": { "followers_count": -1, "count": -1 } },
{ "$limit": 5 }
])
Which is useful where you are unsure of the related "uniqueness".
Upvotes: 3