Michał Albrycht
Michał Albrycht

Reputation: 331

Why this query does not trigger partition pruning in Postgres?

I just noticed Postgres (checked on version 13 and 14) behavior that surprised me. I have a simple table volume with id and unique text column name. Second table dir has 3 columns: id, volume_id and path. This table is partitioned by hash on volume_id column. Here is full table schema with sample data:

CREATE TABLE dir (
   id BIGSERIAL,
   volume_id BIGINT,
   path TEXT
) PARTITION BY HASH (volume_id);

CREATE TABLE dir_0
    PARTITION OF dir FOR VALUES WITH (modulus 3, remainder 0);
CREATE TABLE dir_1
    PARTITION OF dir FOR VALUES WITH (modulus 3, remainder 1);
CREATE TABLE dir_2
    PARTITION OF dir FOR VALUES WITH (modulus 3, remainder 2);


CREATE TABLE volume(
    id BIGINT,
    name TEXT UNIQUE
);

INSERT INTO volume (id, name) VALUES (1, 'vol1'), (2, 'vol2'), (3, 'vol3');
INSERT INTO dir (volume_id, path) SELECT i % 3 + 1, 'path_' || i FROM generate_series(1,1000) AS i;

Now, given the volume name, I need to find all the rows from the dir table on that volume. I can do that in 2 different ways. Query #1

EXPLAIN ANALYZE 
SELECT * FROM dir AS d
    INNER JOIN volume AS v ON d.volume_id = v.id
    WHERE v.name = 'vol1';

Which produces query plan:

QUERY PLAN
Hash Join (cost=1.05..31.38 rows=333 width=37) (actual time=0.186..0.302 rows=333 loops=1)
Hash Cond: (d.volume_id = v.id)
-> Append (cost=0.00..24.00 rows=1000 width=24) (actual time=0.006..0.154 rows=1000 loops=1)
-> Seq Scan on dir_0 d_1 (cost=0.00..6.34 rows=334 width=24) (actual time=0.006..0.032 rows=334 loops=1)
-> Seq Scan on dir_1 d_2 (cost=0.00..6.33 rows=333 width=24) (actual time=0.006..0.029 rows=333 loops=1)
-> Seq Scan on dir_2 d_3 (cost=0.00..6.33 rows=333 width=24) (actual time=0.004..0.026 rows=333 loops=1)
-> Hash (cost=1.04..1.04 rows=1 width=13) (actual time=0.007..0.007 rows=1 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on volume v (cost=0.00..1.04 rows=1 width=13) (actual time=0.003..0.004 rows=1 loops=1)
Filter: (name = 'vol1'::text)
Rows Removed by Filter: 2
Planning Time: 0.500 ms
Execution Time: 0.364 ms

As you can see this query leads to a sequential scan on all 3 partitions of the dir table.


Alternatively, we can write this query like this:

Query #2

EXPLAIN ANALYZE 
SELECT * FROM dir AS d
    WHERE volume_id = (SELECT id FROM volume AS v WHERE v.name = 'vol1');

In that case we get following query plan:

QUERY PLAN
Append (cost=1.04..27.54 rows=1000 width=24) (actual time=0.010..0.066 rows=333 loops=1)
InitPlan 1 (returns $0)
-> Seq Scan on volume v (cost=0.00..1.04 rows=1 width=8) (actual time=0.003..0.004 rows=1 loops=1)
Filter: (name = 'vol1'::text)
Rows Removed by Filter: 2
-> Seq Scan on dir_0 d_1 (cost=0.00..7.17 rows=334 width=24) (never executed)
Filter: (volume_id = $0)
-> Seq Scan on dir_1 d_2 (cost=0.00..7.16 rows=333 width=24) (never executed)
Filter: (volume_id = $0)
-> Seq Scan on dir_2 d_3 (cost=0.00..7.16 rows=333 width=24) (actual time=0.004..0.037 rows=333 loops=1)
Filter: (volume_id = $0)
Planning Time: 0.063 ms
Execution Time: 0.093 ms

Here we can see that partitions dir_0 and dir_1 have never executed annotation.


View on DB Fiddle

My question is: Why in the first case is there no partition pruning? Postgres already knows that the volume.name column is unique and that it will translate into a single volume_id. I would like to get a good intuition on when partition pruning can happen during query execution.

Upvotes: 0

Views: 954

Answers (1)

Laurenz Albe
Laurenz Albe

Reputation: 246073

To get partition pruning with a hash join, you'd need to add a condition on d.volume_id to your query. No inference is made from the join with volume.

Your second query shows partition pruning; the "never executed" means that the query executor pruned the scan of certain partitions.

An alternative method that forces a nested loop join and prunes partitions should be

SELECT *
FROM volume AS v
   CROSS JOIN LATERAL (SELECT * FROM dir
                       WHERE dir.volume_id = v.id
                       OFFSET 0) AS d
WHERE v.name = 'vol1';

OFFSET 0 prevents anything except a nested loop join. Different from your query, that should also work if volumn.name is not unique.

Upvotes: 0

Related Questions