Chad Showalter
Chad Showalter

Reputation: 401

Is it possible to calculate a cumulative sum or moving average with a TimescaleDB continuous aggregate?

Consider a table with 2 columns:

create table foo
(
    ts             timestamp,
    precipitation  numeric,
    primary key (ts)
);

with the following data:

ts precipitation
2021-06-01 12:00:00 1
2021-06-01 13:00:00 0
2021-06-01 14:00:00 2
2021-06-01 15:00:00 3

I would like to use a TimescaleDB continuous aggregate to calculate a three hour cumulative sum of this data that is calculated once per hour. Using the example data above, my continuous aggregate would contain

ts cum_precipitation
2021-06-01 12:00:00 1
2021-06-01 13:00:00 1
2021-06-01 14:00:00 3
2021-06-01 15:00:00 5

I can't see a way to do this with the supported syntax for continuous aggregrates. Am I missing something? Essentially, I would like the time bucket to be the preceding x hours, but the calculation to occur hourly.

Upvotes: 2

Views: 2160

Answers (1)

davidk
davidk

Reputation: 1053

Good question!

You can do this by calculating a normal continuous aggregate and then doing a window function over it. So, calculate a sum() for each hour and then do sum() as a window function would work.

When you get into more complex aggregates like average or standard deviation or percentile approximation or the like, I'd recommend switching over to some of the two-step aggregates we introduced in the TimescaleDB Toolkit. Specifically, I'd look into the statistical aggregates we recently introduced. They can also do this cumulative sum type thing. (They will only work with DOUBLE PRECISION or things that can be cast to that-ie FLOAT, I'd highly recommend you don't use NUMERIC and instead switch to doubles or floats, doesn't seem like you really need infinite precision calculations here).

You can take a look with some queries I wrote up in this presentation but it might look something like:

CREATE MATERIALIZED VIEW response_times_five_min
WITH (timescaledb.continuous)
AS SELECT api_id,
    time_bucket('1 hour'::interval, ts) as bucket,
    stats_agg(response_time)
FROM response_times
GROUP BY 1, 2;

SELECT bucket, 
    average(rolling(stats_agg) OVER last3), 
    sum(rolling(stats_agg) OVER last3)
FROM response_times_five_min
WHERE api_id = 32
WINDOW last3 as 
(ORDER BY bucket RANGE '3 hours' PRECEDING);

Upvotes: 3

Related Questions