Reputation: 1267
I have a MongoDB Analytics-style collection. It contains documents with a timestamp
field and various data. Now I want to get a time series with the number of documents for a time period with a granularity parameter.
I'm currently using the aggregation framework like this (assuming that the granularity is DAY
) :
db.collection.aggregate([{
$match: {
timestamp: {
$gte: start_time,
$lt: end_time
}
}
}, {
$group: {
_id: {
year: { $year: '$timestamp' },
month: { $month: '$timestamp' },
day: { $dayOfMonth: '$timestamp' }
},
count: { $sum: 1 }
}
}, {
$sort: {
_id: 1
}
}])
This way I have a count
value for every day.
The problem is that the count
s will depend on the timezone used when computing the $dayOfMonth
part (each count
is from 00:00:000 UTC to 23:59:999 UTC).
I would like to be able to achieve this without being dependant on the timezone, but relying on the start_time
.
For example, if I use a start_time
at 07:00 UTC, I will get count
s for every day at 07:00 UTC to the next day at 07:00 UTC.
TL;DR : I want something like this : https://dev.twitter.com/ads/reference/get/stats/accounts/%3Aaccount_id/campaigns
Any idea on how to perform this ?
Upvotes: 0
Views: 1580
Reputation: 1267
I found a solution that works pretty good. It's not very natural but anyway.
The idea is to compute a "normalized" date based on the startDate and the date of the row. I use the $mod
operator on the startDate to get the milliseconds + seconds + hours (in the case of a DAY granularity), and then I use $subtract
to subtract it from the date of the row.
Here is an example for a DAY granularity :
var startDate = ISODate("2015-08-25 13:30:00.000Z")
var endDate = ISODate("2015-08-27 13:30:00.000Z")
db.collection.aggregate([{
$match: {
timestamp: {
$gte: startDate,
$lt: endDate
}
}, {
$project: {
timestamp_normalized: {
$subtract: [
"$timestamp",
{
$mod: [
{ "$subtract": [ startDate, new Date("1970-01-01") ] },
1000 * 60 * 60 * 24
]
}
]
}
}
}, {
// now $group with $dayOfMonth
}])
The $mod
part computes the hours + seconds + milliseconds of the startDate after 00:00 UTC, in milliseconds.
The $subtract
retrieves these milliseconds from the original timestamp.
Now I can use $dayOfMonth
operator on my normalized_timestamp
field to get the day if we consider intervals from 13:30 to 13:30 the next day, and use $group
to get count values for these intervals.
EDIT: It's even easier to compute the value to remove from the timestamp for normalization before creating the query, using :
(startDate - new Date(0)) % (1000 * 60 * 60 * 24)
(for a DAY granularity)
Then subtract directly this value from timestamp
instead of using $mod
.
Upvotes: 1