Reputation: 1598
I have a situation that I would like to better understand:
I've a table t
with two rows and one index:
CREATE TABLE t (
refid BIGINT NOT NULL,
created TIMESTAMPTZ NOT NULL
);
CREATE INDEX t_refid_created ON t (refid, created);
In order to get the latest (with the highest created
value) row for each distinct refid
, I composed two queries:
-- index only scan t_refid_created_desc_idx
SELECT DISTINCT ON (refid) * FROM t
ORDER BY refid, created DESC;
-- index scan t_refid_created_idx
SELECT refid, max(created) FROM t GROUP BY refid;
When t
has about 16M rows and the variance in refid
is about 500 different values, the second query returns substantially faster than the second one.
At first I figured that because I'm ordering by created DESC
it needs to do a backwards index scan and when starting from a value with high variance (created). So I added the following index:
CREATE index t_refid_created_desc_idx ON t (refid, created DESC);
It was indeed used (instead of the backwards scan on the previous index) but there was no improvement.
If I understand correctly, the second query would aggregate by refid
and then scan each aggregate to find the max created
value. That sounds like a lot of work.
The first query, to the best of my understanding, should have simply iterated over the first part of the index, then for each refid
it should have used the second part of the index, taking the first value.
Obviously it is not the case and SELECT DISTINCT
query takes twice as long as GROUP BY
.
What am I missing here?
Here are EXPLAIN ANALYZE
outputs for the first and second queries:
Unique (cost=0.56..850119.78 rows=291 width=16) (actual time=0.103..13414.913 rows=469 loops=1)
-> Index Only Scan using t_refid_created_desc_idx on t (cost=0.56..808518.47 rows=16640527 width=16) (actual time=0.102..12113.454 rows=16640527 loops=1)
Heap Fetches: 16640527
Planning time: 0.157 ms
Execution time: 13415.047 ms
Finalize GroupAggregate (cost=599925.13..599932.41 rows=291 width=16) (actual time=3454.350..3454.884 rows=469 loops=1)
Group Key: refid
-> Sort (cost=599925.13..599926.59 rows=582 width=16) (actual time=3454.344..3454.509 rows=1372 loops=1)
Sort Key: refid
Sort Method: quicksort Memory: 113kB
-> Gather (cost=599837.29..599898.40 rows=582 width=16) (actual time=3453.194..3560.602 rows=1372 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial HashAggregate (cost=598837.29..598840.20 rows=291 width=16) (actual time=3448.225..3448.357 rows=457 loops=3)
Group Key: refid
-> Parallel Seq Scan on t (cost=0.00..564169.53 rows=6933553 width=16) (actual time=0.047..2164.459 rows=5546842 loops=3)
Planning time: 0.157 ms
Execution time: 3561.727 ms
The first query runs in about 10 seconds, while the second one achieves the same results in 2 seconds! And without even using the index!
I'm using PostgreSQL 10.5.
Upvotes: 2
Views: 230
Reputation: 247445
I cannot answer the riddle why the DISTINCT ON
does not consider the second plan. From the cost estimates we see thst PostgreSQL considers it cheaper.
I guess that nobody has implemented pushing down DISTINCT
into parallel plans. You could ask the mailing list.
However, the problem with the first query are the 16 million heap fetches. This means that this is actually a normal index scan! It looks like a bad misestimate on the side of the planner.
If I am right, a VACUUM
on the table that cleans the visibility map should improve the first query considerably.
Upvotes: 1