Reputation: 43
I have a Postgresql table which has time series data. I want to aggregate values over configurable periods and durations e.g. aggregate tag-n over 30-minute intervals for one day. The output should look like -
Period | Max | Min | Avg |
---|---|---|---|
00:00 | xx | xx | xx |
00:30 | xx | xx | xx |
01:00 | xx | xx | xx |
... | ... | ... | ... |
23:30 | xx | xx | xx |
To do this, I created a function to generate the custom intervals and then used this function in the query -
create or replace function interval_generator(start_ts timestamp, end_ts timestamp, round_interval INTERVAL)
returns TABLE(row_seq bigint,start_time timestamp,end_time timestamp) as $$
BEGIN
return query
SELECT row_number() over () row_seq,(n) start_time,(n + round_interval) end_time
FROM generate_series(date_trunc('minute', start_ts), end_ts, round_interval) n;
END $$
LANGUAGE 'plpgsql';
with intervals as (select * from interval_generator('2022-08-25 00:00:00','2022-08-26 23:59:59','30 Minute'::INTERVAL))
select intervals.row_seq,intervals.start_time,max("Value"),min("Value"),avg("Value")
from public.<TableName> inner JOIN intervals on "TimeStamp" >= intervals.start_time and "TimeStamp" < intervals.end_time
WHERE "Tag"=xxxxxxxxxx
GROUP BY intervals.row_seq,intervals.start_time
ORDER BY intervals.row_seq asc
I needed an additional column (first one) for the rest of the application logic. Is this the best way to get the desired output in a single query? Or is splitting the query into multiple ones, one query for each interval, better?
I am in no way an expert in SQL or Postgresql, I put together the above queries using my Google skills. It works, but I am not sure if this is optimized.
Upvotes: 2
Views: 83
Reputation: 16407
I have always found that joining using any inequality (>, <, between) to be horribly inefficient. Under the covers it appears to do a cartesian join and then filter out values. I can't swear to that, but benchmarks have shown that if there is any way around it you should.
In this case, I think should convert each timestamp to the interval desired using date_trunc
. Unfortunately there is no native date_trunc
for 30 minutes, but there is a way around that this:
What is the fastest way to truncate timestamps to 5 minutes in Postgres?
Using the concept in the link above, I think you can optimize your query significantly by avoiding the generate series / cartesian:
select
date_trunc('hour', "TimeStamp") +
date_part('minute', "TimeStamp")::int / 30 * interval '30 min' as start_time,
max("Value"),min("Value"),avg("Value")
from public.<TableName>
WHERE "Tag"=xxxxxxxxxx
GROUP BY start_time
ORDER BY start_time
If you need the row_seq, I would recommend using row_number().
Upvotes: 1