Grouping time-series data by time intervals

Question

Let's say we are storing data for 1000s of devices that collect a single type of data every 10s. Each device can be located in a different timezone. The ability to query quickly to visualize the data is important. We can ask the system questions such as the following:

1. For a specific device, I want the last 7 days of data grouped by day totals for my local timezone.
2. For a specific device, I want the last year's data grouped by month totals for my local timezone.

Storing all the data in UTC seems like the cleanest approach, however it becomes tricky when asking for local groupings of the data. For example, a day grouping for each timezone has different offsets. So if we were to store in say day, month, year "buckets" they would all be grouped relative to UTC which would not be useful for asking questions for timezones other than UTC itself.

If we were to group the data in minute and hour "buckets" (ignoring timezones that are off by less than an hour, e.g. IST +5:30) we could use the hour "buckets" to construct the answers to the above questions. For question 2, there would be 12 groupings of up to 744 hour "buckets" for each grouping.

Does the approach with minute and hour (ignoring timezones that are off by less than an hour, e.g. IST +5:30) "buckets" seem like a decent design? Has anyone designed something similar with a different suggestion?

Matt Johnson-Pint · Accepted Answer

Yes, it's a reasonable design to create buckets by offset, and this occurs often in data warehousing (for example).

Though bucketing by 1 hour increments means ignoring many real places. As you pointed out, India is one location that uses a :30 offset. If you want to cover every modern time zone in the world, you actually need to bucked by 15 minute segments, as there are several that are either :30 or :45 offset.

Of course, if you find it acceptable to have a margin of error, then you can use whatever granularity you can tolerate. In theory, you could go larger than an hour - you'd just have a larger margin of error.

If you want to consider a different approach, you can store the value in a date-time-offset form, using the local time of the device. Most databases will convert to UTC when indexing such a value, so you may also need a computed column that extracts and indexes just the local time portion. Then you can group by the day in local time without having to necessarily be aware of how that ties back to UTC. The downside of this approach is that the data is fixed to its original time zone. You can't easily regroup to infer a different time zone. Though if these are actual devices in the real world, that is usually not a concern.

Grouping time-series data by time intervals

Answers (1)

Related Questions