Reputation: 4918
I've a query that joins two tables:
data
has a timestamp columnorders
has start and end timestamp columnsOn table 1 I've an index on the timestamp column and on table 2 I've an index on start|end (in this order). Everything works fine and is fast when I simply join them like FROM d JOIN o ON d.timestamp > o.start_ts AND d.timestamp <= o.end_ts
The problem is that when an order is not completed, the end_ts
is null so I need to change my join to d.timestamp <= COALESCE(o.end_ts, null)
, from the explain I can see that now the index scan includes a filter
section that makes it way slower (from 1s to 14s).
The query I'm currently using is:
SELECT *
FROM data pd
JOIN orders o ON pd.timestamp > o.start_time AND pd.timestamp <= COALESCE(o.end_time, now())
WHERE pd.machine_id = 19 AND pd.name = 'somevalue' AND pd.timestamp > '2019-09-15 22:00:00' AND pd.timestamp <= '2019-09-22 21:59:59.999'
the explain is
Nested Loop (cost=0.72..162190.37 rows=1647921 width=187) (actual time=2.589..18590.362 rows=7775 loops=1)
Buffers: shared hit=44030111 read=3348
I/O Timings: read=20.984
-> Index Scan using data_timestamp_machine_id_name_unique on data pd (cost=0.43..20237.60 rows=1730 width=81) (actual time=0.055..50.357 rows=7713 loops=1)
Index Cond: (("timestamp" > '2019-09-15 22:00:00-07'::timestamp with time zone) AND ("timestamp" <= '2019-09-22 21:59:59.999-07'::timestamp with time zone) AND (machine_id = 19) AND ((name)::text = 'weight'::text))
Buffers: shared hit=5132 read=3216
I/O Timings: read=20.591
-> Index Scan using orders_machine_id_idx on orders o (cost=0.29..72.52 rows=953 width=106) (actual time=2.401..2.401 rows=1 loops=7713)
Index Cond: ((machine_id = 19) AND (pd."timestamp" > start_time))
Filter: (pd."timestamp" <= COALESCE(end_time, now()))
Rows Removed by Filter: 7108
Buffers: shared hit=44024979 read=132
I/O Timings: read=0.393
Planning Time: 0.191 ms
Execution Time: 18591.568 ms
Upvotes: 0
Views: 657
Reputation: 44250
You can avoid the special handling of NULLs by using NOT(negated_condition)
. (but the Infinity
solution is cleaner,IMO)
SELECT *
FROM data pd
JOIN orders o ON pd.timestamp > o.start_time
AND NOT(pd.timestamp > o.end_time )
WHERE pd.machine_id = 19
AND pd.name = 'somevalue'
AND pd.timestamp > '2019-09-15 22:00:00'
AND pd.timestamp <= '2019-09-22 21:59:59.999'
;
Upvotes: 1
Reputation:
This is one of the cases where range types come in handy as they treat NULL values properly.
Create an index on the range:
CREATE INDEX orders_range ON orders using gist (tsrange(start_time,end_time,'(]'));
Then use the same range definition in the join condition
SELECT *
FROM data pd
JOIN orders o ON pd.timestamp <@ ts_range(o.start_time, o.end_time, '(]')
WHERE pd.machine_id = 19
AND pd.name = 'somevalue'
AND pd.timestamp > '2019-09-15 22:00:00' AND pd.timestamp <= '2019-09-22 21:59:59.999'
Unrelated, but: I would use pd.timestamp < '2019-09-22 22:00:00'
instead of a <=
operator with a timestamp that is a few milliseconds before 22:00:00
Upvotes: 2
Reputation: 4918
Thanks to incognito
on IRC I've found a viable solution, create an index like this:
CREATE INDEX orders_timestamps ON orders (start_time,coalesce(end_time,'infinity'));
and change the query to pd.timestamp <= COALESCE(o.end_time, 'infinity')
.
This way I still have my nullable value and I have an indexed value that I can use, since I couldn't index the NOW()
because it's variable
Upvotes: 0