Incorrect rows estimate for joins

Question

I have simple query (Postgres 9.4):

EXPLAIN ANALYZE
SELECT
    COUNT(*)
FROM
    bo_labels L
    LEFT JOIN bo_party party ON (party.id = L.bo_party_fkey)
    LEFT JOIN bo_document_base D ON (D.id = L.bo_doc_base_fkey)
    LEFT JOIN bo_contract_hardwood_deal C ON (C.bo_document_fkey = D.id)
WHERE
    party.inn = '?'

Explain looks like:

QUERY PLAN                                                                                                                                                                                           
 ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 
 Aggregate  (cost=2385.30..2385.30 rows=1 width=0) (actual time=31762.367..31762.367 rows=1 loops=1)                                                                                                  
   ->  Nested Loop Left Join  (cost=1.28..2385.30 rows=1 width=0) (actual time=7.621..31760.776 rows=1694 loops=1)                                                                                    
         Join Filter: ((c.bo_document_fkey)::text = (d.id)::text)                                                                                                                                     
         Rows Removed by Join Filter: 101658634                                                                                                                                                       
         ->  Nested Loop Left Join  (cost=1.28..106.33 rows=1 width=10) (actual time=0.110..54.635 rows=1694 loops=1)                                                                                 
               ->  Nested Loop  (cost=0.85..105.69 rows=1 width=9) (actual time=0.081..4.404 rows=1694 loops=1)                                                                                       
                     ->  Index Scan using bo_party_inn_idx on bo_party party  (cost=0.43..12.43 rows=3 width=10) (actual time=0.031..0.037 rows=3 loops=1)                                            
                           Index Cond: (inn = '2534005760'::text)                                                                                                                                     
                     ->  Index Only Scan using bo_labels__party_fkey__docbase_fkey__tnved_fkey__idx on bo_labels l  (cost=0.42..29.80 rows=1289 width=17) (actual time=0.013..1.041 rows=565 loops=3) 
                           Index Cond: (bo_party_fkey = (party.id)::text)                                                                                                                             
                           Heap Fetches: 0                                                                                                                                                            
               ->  Index Only Scan using bo_document_pkey on bo_document_base d  (cost=0.43..0.64 rows=1 width=10) (actual time=0.022..0.025 rows=1 loops=1694)                                       
                     Index Cond: (id = (l.bo_doc_base_fkey)::text)                                                                                                                                    
                     Heap Fetches: 1134                                                                                                                                                               
         ->  Seq Scan on bo_contract_hardwood_deal c  (cost=0.00..2069.77 rows=59770 width=9) (actual time=0.003..11.829 rows=60012 loops=1694)                                                       
 Planning time: 13.484 ms                                                                                                                                                                             
 Execution time: 31762.885 ms

http://explain.depesz.com/s/V2wn

What is very annoying is incorrect estimate of rows:

Nested Loop  (cost=0.85..105.69 rows=1 width=9) (actual time=0.081..4.404 rows=1694 loops=1)

Because that postgres choose nested loops and query run about 30 seconds. With SET LOCAL enable_nestloop = OFF; it accomplished just in a second.

What is also interesting, I have default_statistics_target = 10000 (at max value) and on all 4 tables run VACUUM VERBOSE ANALYZE just before.

As postgres does not gather statistic between tables such case is very likely possible to happens for other joins too.

Without external extension pghintplan it is not possible change enable_nestloop for just that query.

Is there some other way I could try to force use more speedy way to accomplish that query?

Update by comments

I can't eliminate join in common way. My main search is there any possibilities change statistic (for example) to include desired values which break normal statistical appearance? May be other way to force postgres to change weight of nested loops to use it not so frequently?

Could also someone explain or point to documentation how postgres analyzer for nested loops of two results with 3 (exact correct) and 1289 (which will really 565, but actually such error different question) rows made assumption what in result will be only 1 row??? I've speak about that part of plan:

           ->  Nested Loop  (cost=0.85..105.69 rows=1 width=9) (actual time=0.081..4.404 rows=1694 loops=1)                                                                                       
                 ->  Index Scan using bo_party_inn_idx on bo_party party  (cost=0.43..12.43 rows=3 width=10) (actual time=0.031..0.037 rows=3 loops=1)                                            
                       Index Cond: (inn = '2534005760'::text)                                                                                                                                     
                 ->  Index Only Scan using bo_labels__party_fkey__docbase_fkey__tnved_fkey__idx on bo_labels l  (cost=0.42..29.80 rows=1289 width=17) (actual time=0.013..1.041 rows=565 loops=3) 
                       Index Cond: (bo_party_fkey = (party.id)::text)

On first glance it looks initially wrong. What statistics used there and how? Does postgres maintain also some statistics for indexes?

Incorrect rows estimate for joins

Update by comments

Answers (1)

Related Questions