Speeding up query with `order by`, `limit` and `union`

Question

I have a problem with a view that works as a “directory” of information from other tables. I cannot get PostgreSQL (11.4) to use an index when querying such a view with unions and left join.

First, here's a simplified version without the left join:

create table t1 (
  id uuid,
  n int,
  updated timestamp
);
create index i1 on t1 (updated);

create table t2 (
  id uuid,
  n int,
  updated timestamp
);
create index i2 on t2 (updated);

create or replace view directory as (
    select * from t1
  union all
    select * from t2
);

In the following tests, 10,000 rows were added to each table.

When querying this view with an order by and a limit, everything works well, and the i1 and i2 indices are used:

select *
from directory
order by updated desc
limit 10;

Plan visualized by PEV

Plan: https://lpaste.com/KWcGXlAoEy

However, if a left join is used in the view, the i2 index will no longer be used:

create table aux2 (
  id uuid,
  x text
);
create index auxi2 on aux2 (id);

drop view directory;
create view directory as (
    select t1.id, t1.n, t1.updated, null from t1
  union all
    select t2.id, t2.n, t2.updated, a.x from t2
    left join aux2 a on a.id = t2.id              -- ✱
);

In this case, a sequential scan is done on t2:

select *
from directory
order by updated desc
limit 10;

Plan visualized by PEV

Plan:

Limit  (cost=916.89..917.19 rows=10 width=44) (actual time=6.128..6.135 rows=10 loops=1)
  Buffers: shared hit=141
  ->  Merge Append  (cost=916.89..1516.89 rows=20000 width=44) (actual time=6.127..6.132 rows=10 loops=1)
        Sort Key: t1.updated DESC
        Buffers: shared hit=141
        ->  Index Scan Backward using i1 on t1  (cost=0.29..375.29 rows=10000 width=60) (actual time=0.003..0.007 rows=10 loops=1)
              Buffers: shared hit=3
        ->  Sort  (cost=916.60..941.60 rows=10000 width=29) (actual time=6.122..6.122 rows=1 loops=1)
              Sort Key: t2.updated DESC
              Sort Method: top-N heapsort  Memory: 25kB
              Buffers: shared hit=138
              ->  Hash Left Join  (cost=289.00..600.50 rows=10000 width=29) (actual time=2.240..4.956 rows=10000 loops=1)
                    Hash Cond: (t2.id = a.id)
                    Buffers: shared hit=138
                    ->  Seq Scan on t2  (cost=0.00..174.00 rows=10000 width=28) (actual time=0.005..0.807 rows=10000 loops=1)
                          Buffers: shared hit=74
                    ->  Hash  (cost=164.00..164.00 rows=10000 width=17) (actual time=2.227..2.227 rows=10000 loops=1)
                          Buckets: 16384  Batches: 1  Memory Usage: 607kB
                          Buffers: shared hit=64
                          ->  Seq Scan on aux2 a  (cost=0.00..164.00 rows=10000 width=17) (actual time=0.004..1.092 rows=10000 loops=1)
                                Buffers: shared hit=64
Planning Time: 0.295 ms
Execution Time: 6.161 ms

It's not the left join alone that precludes PostgreSQL from using i2 here. If the view does not contain a union (i.e. we only do the select + left join from t2 and aux2), the i2 index is used.

What's a better way to do this?

I'm both surprised that the planner doesn't use the i2 index, but also that the limit + order by isn't passed down to the node that ends up doing the sequential scan on t2 when the index isn't used. (On the real system, all rows from “t2” are returned, only to have most of them thrown away by the limit at the very end).

Speeding up query with `order by`, `limit` and `union`

Answers (1)

Related Questions