user1681857
user1681857

Reputation: 43

How can optimize this Postgresql query?

I have a Postgresql table which has time series data. I want to aggregate values over configurable periods and durations e.g. aggregate tag-n over 30-minute intervals for one day. The output should look like -

Period Max Min Avg
00:00 xx xx xx
00:30 xx xx xx
01:00 xx xx xx
... ... ... ...
23:30 xx xx xx

To do this, I created a function to generate the custom intervals and then used this function in the query -

create or replace function interval_generator(start_ts timestamp, end_ts timestamp, round_interval INTERVAL) 
returns TABLE(row_seq bigint,start_time timestamp,end_time timestamp) as $$ 
BEGIN 
return query 
SELECT row_number() over () row_seq,(n) start_time,(n + round_interval) end_time 
FROM generate_series(date_trunc('minute', start_ts), end_ts, round_interval) n;
END $$ 
LANGUAGE 'plpgsql';
with intervals as (select * from interval_generator('2022-08-25 00:00:00','2022-08-26 23:59:59','30 Minute'::INTERVAL)) 
select intervals.row_seq,intervals.start_time,max("Value"),min("Value"),avg("Value") 
from public.<TableName> inner JOIN intervals on "TimeStamp" >= intervals.start_time and "TimeStamp" < intervals.end_time 
WHERE "Tag"=xxxxxxxxxx
GROUP BY intervals.row_seq,intervals.start_time 
ORDER BY intervals.row_seq asc

I needed an additional column (first one) for the rest of the application logic. Is this the best way to get the desired output in a single query? Or is splitting the query into multiple ones, one query for each interval, better?

I am in no way an expert in SQL or Postgresql, I put together the above queries using my Google skills. It works, but I am not sure if this is optimized.

Upvotes: 2

Views: 83

Answers (1)

Hambone
Hambone

Reputation: 16407

I have always found that joining using any inequality (>, <, between) to be horribly inefficient. Under the covers it appears to do a cartesian join and then filter out values. I can't swear to that, but benchmarks have shown that if there is any way around it you should.

In this case, I think should convert each timestamp to the interval desired using date_trunc. Unfortunately there is no native date_trunc for 30 minutes, but there is a way around that this:

What is the fastest way to truncate timestamps to 5 minutes in Postgres?

Using the concept in the link above, I think you can optimize your query significantly by avoiding the generate series / cartesian:

select
  date_trunc('hour', "TimeStamp") +
    date_part('minute', "TimeStamp")::int / 30 * interval '30 min' as start_time,
  max("Value"),min("Value"),avg("Value") 
from public.<TableName> 
WHERE "Tag"=xxxxxxxxxx
GROUP BY start_time
ORDER BY start_time

If you need the row_seq, I would recommend using row_number().

Upvotes: 1

Related Questions