Reputation: 1051
I save tweets to mongo DB:
twit.stream('statuses/filter', {'track': ['animal']}, function(stream) {
stream.on('data', function(data) {
console.log(util.inspect(data));
data.created_at = new Date(data.created_at);
collectionAnimal.insert(data, function(err, docs) {});
});
});
It's OK.
The tweet time in MongoDB is in format: 2014-04-25 11:45:14 GMT (column created_at) Now I need group column created_at in hours. I would like to have the result:
hour | count tweets in hour
1 | 28
2 | 26
3 | 32
4 | 42
5 | 36
...
My unsuccessful attempt:
$keys = array('created_at' => true);
$initial = array('count' => 0);
$reduce = "function(doc, prev) { prev.count += 1 }";
$tweetsGroup = $this->collectionAnimal->group( $keys, $initial, $reduce );
But my not able to group by hour.
How to do it?
Upvotes: 17
Views: 23575
Reputation: 140
Since MongoDB 5.0 (year 2021), you can use $dateTrunc
with unit: 'hour'
argument:
db.tweets.aggregate([
{
$project: {
hour: { $dateTrunc: { date: "$created_at", unit: "hour" } },
},
},
{
$group: {
_id: "$hour",
count: { $sum: 1 },
},
},
])
Upvotes: 1
Reputation: 3324
Lalit's answer did not work for me, it kept giving me zeroes. Instead I did:
db.tweets.aggregate(
{ "$project": {
"y":{"$year":"$created_at"},
"m":{"$month":"$created_at"},
"d":{"$dayOfMonth":"$created_at"},
"h":{"$hour":"$created_at"},
"tweet":1 }
},
{ "$group":{
"_id": { "year":"$y","month":"$m","day":"$d","hour":"$h"},
'count':{$sum:1}
}
})
the 'count':{$sum:1}
is the only difference.
Might help someone new to mongo like me.
Upvotes: 6
Reputation: 151132
There should be no need to use a $project
stage here as the date operator functions can just be employed directly in the $group
stage when defining the grouping _id
. This saves having to process the entire collection in order to get the result:
Also you are just counting, so simply { "$sum" : 1 }
, in which defining a field that didn't exist was the problem resulting in 0.
$this->collection->aggregate(array(
array(
'$group' => array(
"_id" => array(
"y" => array( '$year' => '$created_at' ),
"m" => array( '$month' => '$created_at' ),
"d" => array( '$dayOfMonth' => '$created_at' ),
"h" => array( '$hour' => '$created_at' ),
),
"total" => array( '$sum' => 1 ),
),
)
));
If anything, add a $match
stage at the start of the pipeline in order to filter the date. If one day is acceptable for output then you only need to define the $hour
in the grouping and you are reducing the working set size, which means faster. And probably what you want to do anyway.
Upvotes: 11
Reputation: 2354
I could tell you how you can group using aggregation framework directly on mongo console
db.tweets.aggregate(
{ "$project": {
"y":{"$year":"$created_at"},
"m":{"$month":"$created_at"},
"d":{"$dayOfMonth":"$created_at"},
"h":{"$hour":"$created_at"},
"tweet":1 }
},
{ "$group":{
"_id": { "year":"$y","month":"$m","day":"$d","hour":"$h"},
"total":{ "$sum": "$tweet"}
}
})
For more options you can look here: http://docs.mongodb.org/manual/reference/operator/aggregation-date/
You will also need to find appropriate way of of using aggregation framework from whichever programming language you are using.
Upvotes: 30