Is it possible to calculate a cumulative sum or moving average with a TimescaleDB continuous aggregate?

Question

Consider a table with 2 columns:

create table foo
(
    ts             timestamp,
    precipitation  numeric,
    primary key (ts)
);

with the following data:

ts	precipitation
2021-06-01 12:00:00	1
2021-06-01 13:00:00	0
2021-06-01 14:00:00	2
2021-06-01 15:00:00	3

I would like to use a TimescaleDB continuous aggregate to calculate a three hour cumulative sum of this data that is calculated once per hour. Using the example data above, my continuous aggregate would contain

ts	cum_precipitation
2021-06-01 12:00:00	1
2021-06-01 13:00:00	1
2021-06-01 14:00:00	3
2021-06-01 15:00:00	5

I can't see a way to do this with the supported syntax for continuous aggregrates. Am I missing something? Essentially, I would like the time bucket to be the preceding x hours, but the calculation to occur hourly.

davidk · Accepted Answer

Good question!

You can do this by calculating a normal continuous aggregate and then doing a window function over it. So, calculate a sum() for each hour and then do sum() as a window function would work.

When you get into more complex aggregates like average or standard deviation or percentile approximation or the like, I'd recommend switching over to some of the two-step aggregates we introduced in the TimescaleDB Toolkit. Specifically, I'd look into the statistical aggregates we recently introduced. They can also do this cumulative sum type thing. (They will only work with DOUBLE PRECISION or things that can be cast to that-ie FLOAT, I'd highly recommend you don't use NUMERIC and instead switch to doubles or floats, doesn't seem like you really need infinite precision calculations here).

You can take a look with some queries I wrote up in this presentation but it might look something like:

CREATE MATERIALIZED VIEW response_times_five_min
WITH (timescaledb.continuous)
AS SELECT api_id,
    time_bucket('1 hour'::interval, ts) as bucket,
    stats_agg(response_time)
FROM response_times
GROUP BY 1, 2;

SELECT bucket, 
    average(rolling(stats_agg) OVER last3), 
    sum(rolling(stats_agg) OVER last3)
FROM response_times_five_min
WHERE api_id = 32
WINDOW last3 as 
(ORDER BY bucket RANGE '3 hours' PRECEDING);

Is it possible to calculate a cumulative sum or moving average with a TimescaleDB continuous aggregate?

Answers (1)

Related Questions