Sanprof
Sanprof

Reputation: 409

TimescaleDB continuous aggregation custom time_bucket function

We have a device for temperature collecting, it is configured to collect the temperature of the refrigerator box each second in the buffer (but can collect even fewer periods 10ms, 100ms) and at the end of each minute it sends an average value for the last minute (but it can be required to send average temperature per each second or per every 10 seconds). We have a hyper_table for these data. And we want to create some amount of continuous aggregation (with required intervals but for now it is 5min) for this table as it grows continuously. The issue we faced during the creation of materialized view is that the time_bucket function uses the date_bin function which always uses the start_date as basis. It means that time_bucket will always use the next statement time >= start_date and time < end_date which is wrong for us because we need time > start_date and time <= end_date. You can see it in the picture. enter image description here I created a very simple sample to show what I meant

-- create table
drop table if exists temp_measure;
create table if not exists temp_measure(
    time timestamptz,
    "avg_temp" double precision
);

insert into temp_measure select '2023-06-27 05:59:00+00', 6.9;
insert into temp_measure select '2023-06-27 06:00:00+00', 7.2;
insert into temp_measure select '2023-06-27 06:01:00+00', 15.1;
insert into temp_measure select '2023-06-27 06:02:00+00', 10.6;
insert into temp_measure select '2023-06-27 06:03:00+00', 8.5;
insert into temp_measure select '2023-06-27 06:04:00+00', 12;
insert into temp_measure select '2023-06-27 06:05:00+00', 13.2;
insert into temp_measure select '2023-06-27 06:06:00+00', 9.7;
insert into temp_measure select '2023-06-27 06:07:00+00', 8.1;
insert into temp_measure select '2023-06-27 06:08:00+00', 7.9;
insert into temp_measure select '2023-06-27 06:09:00+00', 8.5;
insert into temp_measure select '2023-06-27 06:10:00+00', 7.7;

-- create hypertable
SELECT create_hypertable('temp_measure', 'time', if_not_exists => TRUE, chunk_time_interval => INTERVAL '1 day', migrate_data => true);

-- create materialized view
CREATE MATERIALIZED VIEW temp_measure_5min
WITH (timescaledb.continuous)
AS
SELECT
    time_bucket('5 minutes', time) as bucket,
    ROUND(AVG(avg_temp)::decimal,1)::double precision as avg_temp
FROM
    temp_measure
GROUP BY bucket
WITH NO DATA;

Now lets see what we have at the end

SELECT bucket, avg_temp FROM public.temp_measure_5min order by bucket;

bucket  avg_temp
2023-06-27 07:55:00+02  6.9
2023-06-27 08:00:00+02  10.7
2023-06-27 08:05:00+02  9.5
2023-06-27 08:10:00+02  7.7

This is wrong for us and we have own window function which is correct for us

CREATE OR REPLACE FUNCTION public.time_window(
    _interval interval,
    _time timestamptz,
    _date_from timestamptz)
    RETURNS timestamptz
    LANGUAGE 'plpgsql'
AS $BODY$
begin

    RETURN
        CASE
            WHEN date_bin(_interval, _time, _date_from) = _time THEN _time
            ELSE (date_bin(_interval, _time, _date_from) + _interval)::timestamptz
        END;

end;
$BODY$;

And now if we use our time_window function instead of time_bucket I see the correct result for us

SELECT
    time_window('5 minutes', time, '2001-01-01') as bucket,
    ROUND(AVG(avg_temp)::decimal,1)::double precision as avg_temp
FROM
    temp_measure
GROUP BY bucket
order by bucket;

bucket  avg_temp
2023-06-27 08:00:00+02  7.1
2023-06-27 08:05:00+02  11.9
2023-06-27 08:10:00+02  8.4

It seems we cannot use continuous aggregation for the current version of TimescaleDB because of incorrect time buckets


I provide system info

Upvotes: 3

Views: 306

Answers (0)

Related Questions