Reputation: 2217
I am performing the following query with a self join:
with t as (
SELECT *, TIMESTAMP 'epoch' + tstamp * INTERVAL '1 second' as tstamp2
FROM
mytable
WHERE id = 'a'
LIMIT 1000
)
select v1.id as id, date_trunc('hour', v1.tstamp2) as hour, v1.value as start, v2.value as stop
from
t v1 join
t v2
on v1.id = v2.id and
date_trunc('hour', v1.tstamp2) = date_trunc('hour', v2.tstamp2) and
v1.tstamp2 < v2.tstamp2
where 1=1
limit 100;
The table looks like that:
id tstamp value tstamp2
My goal is to output all the combination of "value" within the same hour for one id. I have 100.000 unique ids and millions of rows. This is extremely slow and inefficient. Is there a way to break the query so the self join operates on time partitions (hour by hour for example) to improve speed of such query?
I have 100.000 unique ids and millions of rows.
EDIT: I found this which seems to be what I want to do but no idea how to implement that:
If you know more than you've let on about the properties of the intervals, you might be able to improve things. For instance if the intervals fall into nonoverlapping buckets then you could add a constraint that the buckets of the two sides are equal. Postgres is a lot better with equality join constraints than it is with range constraints, so it would be able to match up rows and only do the O(N^2) work within each bucket.
Upvotes: 1
Views: 1424
Reputation: 1269873
This answers the question as originally tagged -- "Postgres", not "Redshift".
Unfortunately, Postgres materializes CTEs, which then precludes the use of indexes. You have no ORDER BY
in the CTE, so arbitrary rows are being chosen.
One solution is a temporary table and indexes:
CREATE TEMPORARY TABLE t as
SELECT t.*,
TIMESTAMP 'epoch' + tstamp * INTERVAL '1 second' as tstamp2,
DATE_TRUNC('hour', 'epoch' + tstamp * INTERVAL '1 second') as tstamp2_hour
FROM mytable t
WHERE t.id = 'a'
LIMIT 1000;
CREATE INDEX t_id_hour_tstamp2 ON t(id, tstamp2_hour, tstamp2);
select v1.id as id, v1.tstamp2_hour as hour, v1.value as start, v2.value as stop
from t v1 join
t v2
on v1.id = v2.id and
v1.tstamp2_hour = v2.tstamp2_hour and
v1.tstamp2 < v2.tstamp2
limit 100;
Upvotes: 1