AndreasInfo
AndreasInfo

Reputation: 1227

Continuous aggregates in postgres/timescaledb requires time_bucket-function?

I have a SELECT-query which gives me the aggregated sum(minutes_per_hour_used) of some stuff. Grouped by id, weekday and observed hour.

SELECT id,
       extract(dow from observed_date) AS weekday, (  --observed_date  is type date
       observed_hour,  -- is type timestamp without timezone, every full hour 00:00:00, 01:00:00, ...
       sum(minutes_per_hour_used)
FROM base_table
GROUP BY id, weekday, observed_hour
ORDER BY id, weekday, observed_hour;

The result looks nice, but now I would like to store that in a self-maintained view, which only considers/aggregates the last 8 weeks. I thought contiouus aggregates are the right way, but I can't make it work (https://blog.timescale.com/blog/continuous-aggregates-faster-queries-with-automatically-maintained-materialized-views/). It seems I need to somehow use the time_bucket-function, but actually I don't know how. Any ideas/hints?

I am using postgres with timescaledb.

EDIT: This gives me the desired output, but I can't put it in a continouus aggregate

SELECT id,
       extract(dow from observed_date) AS weekday,
       observed_hour,
       sum(minutes_per_hour_used)
FROM base_table
WHERE observed_date >= now() - interval '8 weeks'
GROUP BY id, weekday, observed_hour
ORDER BY id, weekday, observed_hour;

EDIT: Prepend this with

CREATE VIEW my_view
    WITH (timescaledb.continuous) AS

gives me [0A000] ERROR: invalid SELECT query for continuous aggregate

Upvotes: 3

Views: 5960

Answers (1)

k_rus
k_rus

Reputation: 3219

Continuous aggregates require grouping by time_bucket:

SELECT <grouping_exprs>, <aggregate_functions>
    FROM <hypertable>
[WHERE ... ]
GROUP BY time_bucket( <const_value>, <partition_col_of_hypertable> ),
         [ optional grouping exprs>]
[HAVING ...]

It should be applied to a partitioned column, which is usually the time dimension column used in the hypertable creation. Also ORDER BY is not supported.

In the case of the aggregate query in the question no time column is used for grouping. Neither weekday nor observed_hour are time valid columns, since they don't increase as time, instead their values are repeat regularly. weekday repeats every 7 days and observed_hour repeats every 24 hours. This breaks requirements for continuous aggregates.

Since there is no ready solution for this use case, one approach is to use a continuous aggregate to reduce the amount of data for the targeted query, e.g., by bucketing by day:

CREATE MATERIALIZED VIEW daily
WITH (timescaledb.continuous) AS
SELECT id,
       time_bucket('1day', observed_date) AS day,
       observed_hour,
       sum(minutes_per_hour_used)
FROM base_table
GROUP BY 1, 2, 3;

Then execute the targeted aggregate query on top of it:

SELECT id,
       extract(dow from day) AS weekday,
       observed_hour,
       sum(minutes_per_hour_used)
FROM daily
WHERE day >= now() - interval '8 weeks'
GROUP BY id, weekday, observed_hour
ORDER BY id, weekday, observed_hour;

Another approach is to use PostgreSQL's materialized views and refresh it on regular basis with help of custom jobs, which is run by the job scheduling framework of TimescaleDB. Note that the refresh will re-calculate entire view, which in the example case covers 8 weeks of data. The materialized view can be written in terms of the original table base_table or in terms of the continuous aggregate suggested above.

Upvotes: 5

Related Questions