Reputation: 1750
Enviroment
My PostgreSQL (9.2) schema looks like this:
CREATE TABLE first
(
id_first bigint NOT NULL,
first_date timestamp without time zone NOT NULL,
CONSTRAINT first_pkey PRIMARY KEY (id_first)
)
WITH (
OIDS=FALSE
);
CREATE INDEX first_first_date_idx
ON first
USING btree
(first_date);
CREATE TABLE second
(
id_second bigint NOT NULL,
id_first bigint NOT NULL,
CONSTRAINT second_pkey PRIMARY KEY (id_second),
CONSTRAINT fk_first FOREIGN KEY (id_first)
REFERENCES first (id_first) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION
)
WITH (
OIDS=FALSE
);
CREATE INDEX second_id_first_idx
ON second
USING btree
(id_first);
CREATE TABLE third
(
id_third bigint NOT NULL,
id_second bigint NOT NULL,
CONSTRAINT third_pkey PRIMARY KEY (id_third),
CONSTRAINT fk_second FOREIGN KEY (id_second)
REFERENCES second (id_second) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION
)
WITH (
OIDS=FALSE
);
CREATE INDEX third_id_second_idx
ON third
USING btree
(id_second);
So, I have 3 tables with own PK. First
has an index on first_date
, Second
has a FK from First
and index on it. Third
as a FK from Second
and index on it aswell:
First (0 --> n) Second (0 --> n) Third
First
table contains about 10 000 000
records.
Second
table contains about 20 000 000
records.
Third
table contains about 18 000 000
records.
Date range in column first_date
is from 2016-01-01 till today.
random_cost_page
is set to 2.0
.
default_statistics_target
is set to 100
.
All FK
, PK
and first_date
STATISTICS
are set to 5000
Task to do
I want to count all Third
rows connected with First
, where first_date < X
My query:
SELECT count(t.id_third) AS count
FROM first f
JOIN second s ON s.id_first = f.id_first
JOIN third t ON t.id_second = s.id_second
WHERE first_date < _my_date
Problem description
_my_date = '2016-01-03'
Everything working pretty well. Query lasts 1-2 seconds.
EXPLAIN ANALYZE
:
"Aggregate (cost=8585512.55..8585512.56 rows=1 width=8) (actual time=67.310..67.310 rows=1 loops=1)"
" -> Merge Join (cost=4208477.49..8583088.04 rows=969805 width=8) (actual time=44.277..65.948 rows=17631 loops=1)"
" Merge Cond: (s.id_second = t.id_second)"
" -> Sort (cost=4208477.48..4211121.75 rows=1057709 width=8) (actual time=44.263..46.035 rows=19230 loops=1)"
" Sort Key: s.id_second"
" Sort Method: quicksort Memory: 1670kB"
" -> Nested Loop (cost=0.01..4092310.41 rows=1057709 width=8) (actual time=6.169..39.183 rows=19230 loops=1)"
" -> Index Scan using first_first_date_idx on first f (cost=0.01..483786.81 rows=492376 width=8) (actual time=6.159..12.223 rows=10346 loops=1)"
" Index Cond: (first_date < '2016-01-03 00:00:00'::timestamp without time zone)"
" -> Index Scan using second_id_first_idx on second s (cost=0.00..7.26 rows=7 width=16) (actual time=0.002..0.002 rows=2 loops=10346)"
" Index Cond: (id_first = f.id_first)"
" -> Index Scan using third_id_second_idx on third t (cost=0.00..4316649.89 rows=17193788 width=16) (actual time=0.008..7.293 rows=17632 loops=1)"
"Total runtime: 67.369 ms"
_my_date = '2016-01-11'
or moreQuery is not using a indexscan
anymore - replaced by seqscan
and last 3-4 minutes...
Query plan:
"Aggregate (cost=8731468.75..8731468.76 rows=1 width=8) (actual time=234411.229..234411.229 rows=1 loops=1)"
" -> Hash Join (cost=4352424.81..8728697.88 rows=1108348 width=8) (actual time=189670.068..234400.540 rows=138246 loops=1)"
" Hash Cond: (t.id_second = o.id_second)"
" -> Seq Scan on third t (cost=0.00..4128080.88 rows=17193788 width=16) (actual time=0.016..124111.453 rows=17570724 loops=1)"
" -> Hash (cost=4332592.69..4332592.69 rows=1208810 width=8) (actual time=98566.740..98566.740 rows=151263 loops=1)"
" Buckets: 16384 Batches: 16 Memory Usage: 378kB"
" -> Hash Join (cost=561918.25..4332592.69 rows=1208810 width=8) (actual time=6535.801..98535.915 rows=151263 loops=1)"
" Hash Cond: (s.id_first = f.id_first)"
" -> Seq Scan on second s (cost=0.00..3432617.48 rows=18752248 width=16) (actual time=6090.771..88891.691 rows=19132869 loops=1)"
" -> Hash (cost=552685.31..552685.31 rows=562715 width=8) (actual time=444.630..444.630 rows=81650 loops=1)"
" -> Index Scan using first_first_date_idx on first f (cost=0.01..552685.31 rows=562715 width=8) (actual time=7.987..421.087 rows=81650 loops=1)"
" Index Cond: (first_date < '2016-01-13 00:00:00'::timestamp without time zone)"
"Total runtime: 234411.303 ms"
For test purposes, I have set:
SET enable_seqscan = OFF;
My queries start using indexscan
again and last for 1-10 s (depends on range).
Question
Why this is working like that? How to convince a Query Planner to use a indexscan
?
EDIT
After reducing a random_page_cost
to 1.1
, I can select about 30 days now still using a indexscan
. Query plan changed a little bit:
"Aggregate (cost=8071389.47..8071389.48 rows=1 width=8) (actual time=4915.196..4915.196 rows=1 loops=1)"
" -> Nested Loop (cost=0.01..8067832.28 rows=1422878 width=8) (actual time=14.402..4866.937 rows=399184 loops=1)"
" -> Nested Loop (cost=0.01..3492321.55 rows=1551849 width=8) (actual time=14.393..3012.617 rows=436794 loops=1)"
" -> Index Scan using first_first_date_idx on first f (cost=0.01..432541.99 rows=722404 width=8) (actual time=14.372..729.233 rows=236007 loops=1)"
" Index Cond: (first_date < '2016-02-01 00:00:00'::timestamp without time zone)"
" -> Index Scan using second_id_first_idx on second s (cost=0.00..4.17 rows=7 width=16) (actual time=0.008..0.009 rows=2 loops=236007)"
" Index Cond: (second = f.id_second)"
" -> Index Scan using third_id_second_idx on third t (cost=0.00..2.94 rows=1 width=16) (actual time=0.004..0.004 rows=1 loops=436794)"
" Index Cond: (id_second = s.id_second)"
"Total runtime: 4915.254 ms"
However, I still don get it why asking for more couse a seqscan
...
Iteresting is that, when I ask for range just above some kind of limit I getting a Query plan like this (here select for 40 days - asking for more will produce full seqscan
again):
"Aggregate (cost=8403399.27..8403399.28 rows=1 width=8) (actual time=138303.216..138303.217 rows=1 loops=1)"
" -> Hash Join (cost=3887619.07..8399467.63 rows=1572656 width=8) (actual time=44056.443..138261.203 rows=512062 loops=1)"
" Hash Cond: (t.id_second = s.id_second)"
" -> Seq Scan on third t (cost=0.00..4128080.88 rows=17193788 width=16) (actual time=0.004..119497.056 rows=17570724 loops=1)"
" -> Hash (cost=3859478.04..3859478.04 rows=1715203 width=8) (actual time=5695.077..5695.077 rows=560503 loops=1)"
" Buckets: 16384 Batches: 16 Memory Usage: 1390kB"
" -> Nested Loop (cost=0.01..3859478.04 rows=1715203 width=8) (actual time=65.250..5533.413 rows=560503 loops=1)"
" -> Index Scan using first_first_date_idx on first f (cost=0.01..477985.28 rows=798447 width=8) (actual time=64.927..1688.341 rows=302663 loops=1)"
" Index Cond: (first_date < '2016-02-11 00:00:00'::timestamp without time zone)"
" -> Index Scan using second_id_first_idx on second s (cost=0.00..4.17 rows=7 width=16) (actual time=0.010..0.012 rows=2 loops=302663)"
" Index Cond: (id_first = f.id_first)"
"Total runtime: 138303.306 ms"
UPDATE after Laurenz Able suggestions
After rewritting a query plan as Laurenz Able suggested:
"Aggregate (cost=9102321.05..9102321.06 rows=1 width=8) (actual time=15237.830..15237.830 rows=1 loops=1)"
" -> Merge Join (cost=4578171.25..9097528.19 rows=1917143 width=8) (actual time=9111.694..15156.092 rows=803657 loops=1)"
" Merge Cond: (third.id_second = s.id_second)"
" -> Index Scan using third_id_second_idx on third (cost=0.00..4270478.19 rows=17193788 width=16) (actual time=23.650..5425.137 rows=803658 loops=1)"
" -> Materialize (cost=4577722.81..4588177.38 rows=2090914 width=8) (actual time=9088.030..9354.326 rows=879283 loops=1)"
" -> Sort (cost=4577722.81..4582950.09 rows=2090914 width=8) (actual time=9088.023..9238.426 rows=879283 loops=1)"
" Sort Key: s.id_second"
" Sort Method: external sort Disk: 15480kB"
" -> Merge Join (cost=673389.38..4341477.37 rows=2090914 width=8) (actual time=3662.239..8485.768 rows=879283 loops=1)"
" Merge Cond: (s.id_first = f.id_first)"
" -> Index Scan using second_id_first_idx on second s (cost=0.00..3587838.88 rows=18752248 width=16) (actual time=0.015..4204.308 rows=879284 loops=1)"
" -> Materialize (cost=672960.82..677827.55 rows=973345 width=8) (actual time=3662.216..3855.667 rows=892988 loops=1)"
" -> Sort (cost=672960.82..675394.19 rows=973345 width=8) (actual time=3662.213..3745.975 rows=476519 loops=1)"
" Sort Key: f.id_first"
" Sort Method: external sort Disk: 8400kB"
" -> Index Scan using first_first_date_idx on first f (cost=0.01..568352.90 rows=973345 width=8) (actual time=126.386..3233.134 rows=476519 loops=1)"
" Index Cond: (first_date < '2016-03-03 00:00:00'::timestamp without time zone)"
"Total runtime: 15244.404 ms"
Upvotes: 1
Views: 242
Reputation: 246063
First, it looks like some of the estimates are off.
Try to ANALYZE
the tables and see if that changes the query plan chosen.
What might also help is to lower random_page_cost
to a value just over 1 and see if that improves the plan.
It is interesting to note that the index scan on third_id_second_idx
in the fast query produces only 17632 rows instead of over 17 million, which I can only explain by assuming that from that row on, the values of id_second
no longer match any row in the join of first
and second
, i.e. the merge join is completed after that.
You can try to exploit that with with a rewritten query. Try
JOIN (SELECT id_second, id_third FROM third ORDER BY id_second) t
instead of
JOIN third t
That may result in a better plan since PostgreSQL won't optimize the ORDER BY
away, and the planner may decide that since it has to sort third
anyway, it may be cheaper to use a merge join. That way you trick the planner into choosing a plan that it wouldn't recognize as ideal. With a different value distribution the planner's original choice would probably be better.
Upvotes: 3