Reputation: 5349
I have simple query (Postgres 9.4):
EXPLAIN ANALYZE
SELECT
COUNT(*)
FROM
bo_labels L
LEFT JOIN bo_party party ON (party.id = L.bo_party_fkey)
LEFT JOIN bo_document_base D ON (D.id = L.bo_doc_base_fkey)
LEFT JOIN bo_contract_hardwood_deal C ON (C.bo_document_fkey = D.id)
WHERE
party.inn = '?'
Explain looks like:
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=2385.30..2385.30 rows=1 width=0) (actual time=31762.367..31762.367 rows=1 loops=1)
-> Nested Loop Left Join (cost=1.28..2385.30 rows=1 width=0) (actual time=7.621..31760.776 rows=1694 loops=1)
Join Filter: ((c.bo_document_fkey)::text = (d.id)::text)
Rows Removed by Join Filter: 101658634
-> Nested Loop Left Join (cost=1.28..106.33 rows=1 width=10) (actual time=0.110..54.635 rows=1694 loops=1)
-> Nested Loop (cost=0.85..105.69 rows=1 width=9) (actual time=0.081..4.404 rows=1694 loops=1)
-> Index Scan using bo_party_inn_idx on bo_party party (cost=0.43..12.43 rows=3 width=10) (actual time=0.031..0.037 rows=3 loops=1)
Index Cond: (inn = '2534005760'::text)
-> Index Only Scan using bo_labels__party_fkey__docbase_fkey__tnved_fkey__idx on bo_labels l (cost=0.42..29.80 rows=1289 width=17) (actual time=0.013..1.041 rows=565 loops=3)
Index Cond: (bo_party_fkey = (party.id)::text)
Heap Fetches: 0
-> Index Only Scan using bo_document_pkey on bo_document_base d (cost=0.43..0.64 rows=1 width=10) (actual time=0.022..0.025 rows=1 loops=1694)
Index Cond: (id = (l.bo_doc_base_fkey)::text)
Heap Fetches: 1134
-> Seq Scan on bo_contract_hardwood_deal c (cost=0.00..2069.77 rows=59770 width=9) (actual time=0.003..11.829 rows=60012 loops=1694)
Planning time: 13.484 ms
Execution time: 31762.885 ms
http://explain.depesz.com/s/V2wn
What is very annoying is incorrect estimate of rows:
Nested Loop (cost=0.85..105.69 rows=1 width=9) (actual time=0.081..4.404 rows=1694 loops=1)
Because that postgres choose nested loops and query run about 30 seconds.
With SET LOCAL enable_nestloop = OFF;
it accomplished just in a second.
What is also interesting, I have default_statistics_target
= 10000 (at max value) and on all 4 tables run VACUUM VERBOSE ANALYZE
just before.
As postgres does not gather statistic between tables such case is very likely possible to happens for other joins too.
Without external extension pghintplan it is not possible change enable_nestloop
for just that query.
Is there some other way I could try to force use more speedy way to accomplish that query?
I can't eliminate join in common way. My main search is there any possibilities change statistic (for example) to include desired values which break normal statistical appearance? May be other way to force postgres to change weight of nested loops
to use it not so frequently?
Could also someone explain or point to documentation how postgres analyzer for nested loops of two results with 3 (exact correct) and 1289 (which will really 565, but actually such error different question) rows made assumption what in result will be only 1 row??? I've speak about that part of plan:
-> Nested Loop (cost=0.85..105.69 rows=1 width=9) (actual time=0.081..4.404 rows=1694 loops=1)
-> Index Scan using bo_party_inn_idx on bo_party party (cost=0.43..12.43 rows=3 width=10) (actual time=0.031..0.037 rows=3 loops=1)
Index Cond: (inn = '2534005760'::text)
-> Index Only Scan using bo_labels__party_fkey__docbase_fkey__tnved_fkey__idx on bo_labels l (cost=0.42..29.80 rows=1289 width=17) (actual time=0.013..1.041 rows=565 loops=3)
Index Cond: (bo_party_fkey = (party.id)::text)
On first glance it looks initially wrong. What statistics used there and how? Does postgres maintain also some statistics for indexes?
Upvotes: 2
Views: 1427
Reputation: 2187
Actually, I don't have a good sample data to test my answer but I think it might help.
Based on your join columns I'm assuming the following relationship cardinality:
1) bo_party (id 1:N bo_party_fkey) bo_labels 2) bo_labels (bo_doc_base_fkey N:1 id) bo_document_base 3) bo_document_base (id 1:N bo_document_fkey) bo_contract_hardwood_deal
You want to count how much rows were selected. So, based on the cardinality in 1) and 2) the table "bo_labels" have a many to many relationship. This means that joining it with "bo_party" and "bo_document_base" will produce no more rows than the ones existing in the table.
But, after joining "bo_document_base", another join is done to "bo_contract_hardwood_deal" which cardinality described in 3) is one to many, perhaps generating more rows in the final result.
This way, to find the right count of rows you can simplify the join structure to "bo_labels" and "bo_contract_hardwood_deal" through:
4) bo_labels (bo_doc_base_fkey 1:N bo_document_fkey) bo_contract_hardwood_deal
A sample query could be one of the following:
SELECT COUNT(*)
FROM bo_labels L
LEFT JOIN bo_contract_hardwood_deal C ON (C.bo_document_fkey = L.bo_doc_base_fkey)
WHERE 1=1
and exists
(
select 1
from bo_party party
where 1=1
and party.id = L.bo_party_fkey
and party.inn = '?'
)
;
or
SELECT sum((select COUNT(*) from bo_contract_hardwood_deal C where C.bo_document_fkey = L.bo_doc_base_fkey))
FROM bo_labels L
WHERE 1=1
and exists
(
select 1
from bo_party party
where 1=1
and party.id = L.bo_party_fkey
and party.inn = '?'
)
;
I could not test with large tables, so I don't know exactly if it will improve performance against your original query, but I think it might help.
Upvotes: 1