Reputation: 2560
I have a problem with a view that works as a “directory” of information from other tables. I cannot get PostgreSQL (11.4) to use an index when querying such a view with unions
and left join
.
First, here's a simplified version without the left join
:
create table t1 (
id uuid,
n int,
updated timestamp
);
create index i1 on t1 (updated);
create table t2 (
id uuid,
n int,
updated timestamp
);
create index i2 on t2 (updated);
create or replace view directory as (
select * from t1
union all
select * from t2
);
In the following tests, 10,000 rows were added to each table.
When querying this view with an order by
and a limit
, everything works well, and the i1
and i2
indices are used:
select *
from directory
order by updated desc
limit 10;
Plan: https://lpaste.com/KWcGXlAoEy
However, if a left join
is used in the view, the i2
index will no longer be used:
create table aux2 (
id uuid,
x text
);
create index auxi2 on aux2 (id);
drop view directory;
create view directory as (
select t1.id, t1.n, t1.updated, null from t1
union all
select t2.id, t2.n, t2.updated, a.x from t2
left join aux2 a on a.id = t2.id -- ✱
);
In this case, a sequential scan is done on t2
:
select *
from directory
order by updated desc
limit 10;
Plan:
Limit (cost=916.89..917.19 rows=10 width=44) (actual time=6.128..6.135 rows=10 loops=1)
Buffers: shared hit=141
-> Merge Append (cost=916.89..1516.89 rows=20000 width=44) (actual time=6.127..6.132 rows=10 loops=1)
Sort Key: t1.updated DESC
Buffers: shared hit=141
-> Index Scan Backward using i1 on t1 (cost=0.29..375.29 rows=10000 width=60) (actual time=0.003..0.007 rows=10 loops=1)
Buffers: shared hit=3
-> Sort (cost=916.60..941.60 rows=10000 width=29) (actual time=6.122..6.122 rows=1 loops=1)
Sort Key: t2.updated DESC
Sort Method: top-N heapsort Memory: 25kB
Buffers: shared hit=138
-> Hash Left Join (cost=289.00..600.50 rows=10000 width=29) (actual time=2.240..4.956 rows=10000 loops=1)
Hash Cond: (t2.id = a.id)
Buffers: shared hit=138
-> Seq Scan on t2 (cost=0.00..174.00 rows=10000 width=28) (actual time=0.005..0.807 rows=10000 loops=1)
Buffers: shared hit=74
-> Hash (cost=164.00..164.00 rows=10000 width=17) (actual time=2.227..2.227 rows=10000 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 607kB
Buffers: shared hit=64
-> Seq Scan on aux2 a (cost=0.00..164.00 rows=10000 width=17) (actual time=0.004..1.092 rows=10000 loops=1)
Buffers: shared hit=64
Planning Time: 0.295 ms
Execution Time: 6.161 ms
It's not the left join
alone that precludes PostgreSQL from using i2
here. If the view does not contain a union
(i.e. we only do the select
+ left join
from t2
and aux2
), the i2
index is used.
What's a better way to do this?
I'm both surprised that the planner doesn't use the i2
index, but also that the limit
+ order by
isn't passed down to the node that ends up doing the sequential scan on t2
when the index isn't used. (On the real system, all rows from “t2” are returned, only to have most of them thrown away by the limit
at the very end).
Upvotes: 0
Views: 262
Reputation: 247545
Obviously the optimizer isn't smart enough to do the best thing if there is a UNION ALL
involved.
One solution is to write the query in a more complicated way that helps the optimizer to figure out what is the best thing to do. You have to do without the view for that:
SELECT *
FROM ((SELECT t1.id, t1.n, t1.updated, null
FROM t1
ORDER BY t1.updated DESC
LIMIT 10)
UNION ALL
(SELECT t2.id, t2.n, t2.updated, a.x
FROM t2
LEFT JOIN aux2 a ON a.id = t2.id
ORDER BY t2.updated DESC
LIMIT 10)
) AS q
ORDER BY updated DESC
LIMIT 10;
Upvotes: 2