Reputation: 8587
I was wondering if someone could help me get my aggregation function right. I'm trying to count the number of times a piece of text appears per hour in a specified day. So far I've got:
db.daily_data.aggregate(
[
{ $project : { useragent: 1, datetime: 1, url: 1, hour: {$hour: new Date("$datetime")} } },
{ $match : { datetime: {$gte: 1361318400000, $lt: 1361404800000}, useragent: /.*LinkCheck by Siteimprove.*/i } },
{ $group : { _id : { useragent: "$useragent", hour: "$hour" }, queriesPerUseragent: {$sum: 1} } }
]
);
But I'm obviously getting it wrong as hour is always 0:
{
"result" : [
{
"_id" : {
"useragent" : "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.0) LinkCheck by Siteimprove.com",
"hour" : 0
},
"queriesPerUseragent" : 94215
}
],
"ok" : 1
}
Here's a trimmed down example of a record too:
{
"_id" : ObjectId("50fe63c70266a712e8663725"),
"useragent" : "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.0) LinkCheck by Siteimprove.com",
"datetime" : NumberLong("1358848954813"),
"url" : "http://www.somewhere.com"
}
I've also tried using new Date("$datetime").getHours()
instead of the $hour
function to try and get the same result but with no luck. Can someone point me in the direction of where I'm going wrong?
Thanks!
Upvotes: 3
Views: 6155
Reputation: 2225
This is a recommendation rather an answer for your problem.
On MongoDB for analytics it's recommended to pre-aggregate your buckets (hourly buckets in your use case) for every metric you want to calculate.
So, for your metric you can update your pre-aggregated collection (speeding up your query time):
db.user_agent_hourly.update({url: "your_url", useragent: "your user agent", hour: current_HOUR_of_DAY, date: current_DAY_Date}, {$inc: {counter:1}}, {upsert:true})
Take into account that in current_DAY_Date
you have to point to stable date value for the current day, i.e., current_year/current_month/current_day 00:00:00 , using the same hour:minute:second to every metric received in current day.
Then, you can query this collection, extracting aggregated analytics for any given period of time as follows:
db.user_agent_hourly.aggregate(
{$match:{date:{$gte: INITIAL_DATE, $lt: FINAL_DATE}}},
{$group:{ _id : { useragent: "$useragent", hour: "$hour" } ,queriesPerUseragent: {$sum: "$count"} } },
{$sort:{queriesPerUseragent:-1}}
)
If you want to filter the results using a specific user agent, you can use the next query:
db.user_agent_hourly.aggregate(
{$match:{date:{$gte: INITIAL_DATE, $lt: FINAL_DATE, useragent: "your_user_agent"}}},
{$group:{ _id : { useragent: "$useragent", hour: "$hour" }, queriesPerUseragent: {$sum: "$count"} } }
)
PS: We store every single received metric in other collection to be able to reprocess it in case of disaster or other needs.
Upvotes: 3