Antek
Antek

Reputation: 1267

MongoDB aggregation : time series with granularity

I have a MongoDB Analytics-style collection. It contains documents with a timestamp field and various data. Now I want to get a time series with the number of documents for a time period with a granularity parameter.

I'm currently using the aggregation framework like this (assuming that the granularity is DAY) :

db.collection.aggregate([{
  $match: {
    timestamp: {
      $gte: start_time,
      $lt: end_time
    }
  }
}, {
  $group: {
    _id: {
      year: { $year: '$timestamp' },
      month: { $month: '$timestamp' },
      day: { $dayOfMonth: '$timestamp' }
    },
    count: { $sum: 1 }
  }
}, {
  $sort: {
    _id: 1
  }
}])

This way I have a count value for every day. The problem is that the counts will depend on the timezone used when computing the $dayOfMonth part (each count is from 00:00:000 UTC to 23:59:999 UTC).

I would like to be able to achieve this without being dependant on the timezone, but relying on the start_time. For example, if I use a start_time at 07:00 UTC, I will get counts for every day at 07:00 UTC to the next day at 07:00 UTC.

TL;DR : I want something like this : https://dev.twitter.com/ads/reference/get/stats/accounts/%3Aaccount_id/campaigns

Any idea on how to perform this ?

Upvotes: 0

Views: 1580

Answers (1)

Antek
Antek

Reputation: 1267

I found a solution that works pretty good. It's not very natural but anyway.

The idea is to compute a "normalized" date based on the startDate and the date of the row. I use the $mod operator on the startDate to get the milliseconds + seconds + hours (in the case of a DAY granularity), and then I use $subtract to subtract it from the date of the row.

Here is an example for a DAY granularity :

var startDate = ISODate("2015-08-25 13:30:00.000Z")
var endDate   = ISODate("2015-08-27 13:30:00.000Z")

db.collection.aggregate([{
    $match: {
      timestamp: {
        $gte: startDate,
        $lt: endDate
    }
}, {
  $project: {
    timestamp_normalized: {
      $subtract: [
        "$timestamp",
        {
          $mod: [
            { "$subtract": [ startDate, new Date("1970-01-01") ] },
            1000 * 60 * 60 * 24
          ]
        }
      ]
    }
  }
}, {
  // now $group with $dayOfMonth
}])

The $mod part computes the hours + seconds + milliseconds of the startDate after 00:00 UTC, in milliseconds.

The $subtract retrieves these milliseconds from the original timestamp.

Now I can use $dayOfMonth operator on my normalized_timestamp field to get the day if we consider intervals from 13:30 to 13:30 the next day, and use $group to get count values for these intervals.

EDIT: It's even easier to compute the value to remove from the timestamp for normalization before creating the query, using :

(startDate - new Date(0)) % (1000 * 60 * 60 * 24)

(for a DAY granularity)

Then subtract directly this value from timestamp instead of using $mod.

Upvotes: 1

Related Questions