Strange pgsql query performance

Question

I have a relation like this

R ( EDGE INTEGER, DIHEDRAL INTEGER, FACE INTEGER , VALENCY INTEGER)

I tested twice, 64 rows table R and 128 rows table R. but the simpler one takes much more time than the second one. The explain is like below (It shows error on explain.depesz.com). Could anyone help me to check why? thanks.

plan for 64 rows:

HashAggregate  (cost=260.16..260.17 rows=1 width=12) (actual rows=64 loops=1)
->  Nested Loop  (cost=89.44..260.15 rows=1 width=12) (actual rows=256 loops=1)
     Join Filter: ((f1.face < f2.face) AND (e3.edge <> f1.edge) AND (e4.edge <> e3.edge) AND (f1.edge = f2.edge) AND (f1.face =
e3.face))
     Rows Removed by Join Filter: 142606080
     ->  Nested Loop  (cost=41.91..167.59 rows=1 width=16) (actual rows=557056 loops=1)
           ->  Nested Loop  (cost=41.91..125.71 rows=1 width=8) (actual rows=256 loops=1)
                 Join Filter: ((e5.edge <> f2.edge) AND (e5.edge <> e2.edge) AND (e2.face = e5.face))
                 Rows Removed by Join Filter: 1113856
                 ->  Hash Join  (cost=41.91..83.73 rows=1 width=16) (actual rows=512 loops=1)
                       Hash Cond: (f2.face = e2.face)
                       Join Filter: (e2.edge <> f2.edge)
                       Rows Removed by Join Filter: 256
                       ->  Seq Scan on r f2  (cost=0.00..41.76 rows=12 width=8) (actual rows=384 loops=1)
                             Filter: (valency = 3)
                             Rows Removed by Filter: 1920
                       ->  Hash  (cost=41.76..41.76 rows=12 width=8) (actual rows=2176 loops=1)
                             Buckets: 1024  Batches: 1  Memory Usage: 85kB
                             ->  Seq Scan on r e2  (cost=0.00..41.76 rows=12 width=8) (actual rows=2176 loops=1)
                                   Filter: (dihedral = 2)
                                   Rows Removed by Filter: 128
                 ->  Seq Scan on r e5  (cost=0.00..41.76 rows=12 width=8) (actual rows=2176 loops=512)
                       Filter: (dihedral = 2)
                       Rows Removed by Filter: 128
           ->  Seq Scan on r e3  (cost=0.00..41.76 rows=12 width=8) (actual rows=2176 loops=256)
                 Filter: (dihedral = 2)
                 Rows Removed by Filter: 128
     ->  Hash Join  (cost=47.53..92.32 rows=11 width=16) (actual rows=256 loops=557056)
           Hash Cond: (e4.face = f1.face)
           Join Filter: (e4.edge <> f1.edge)
           Rows Removed by Join Filter: 128
           ->  Seq Scan on r e4  (cost=0.00..36.01 rows=2301 width=8) (actual rows=2304 loops=557056)
           ->  Hash  (cost=47.52..47.52 rows=1 width=8) (actual rows=128 loops=1)
                 Buckets: 1024  Batches: 1  Memory Usage: 5kB
                 ->  Seq Scan on r f1  (cost=0.00..47.52 rows=1 width=8) (actual rows=128 loops=1)
                       Filter: ((valency = 3) AND (dihedral = 1))
                       Rows Removed by Filter: 2176
Total runtime: 159268.541 ms
(37 rows)

plan for 128 rows

HashAggregate  (cost=501.28..501.29 rows=1 width=12) (actual rows=128 loops=1)
->  Nested Loop  (cost=171.98..501.27 rows=2 width=12) (actual rows=512 loops=1)
     Join Filter: ((e3.edge <> f1.edge) AND (e4.edge <> e3.edge) AND (f1.face = e3.face))
     Rows Removed by Join Filter: 2227712
     ->  Seq Scan on r e3  (cost=0.00..80.31 rows=22 width=8) (actual rows=4352 loops=1)
           Filter: (dihedral = 2)
           Rows Removed by Filter: 256
     ->  Materialize  (cost=171.98..420.08 rows=2 width=20) (actual rows=512 loops=4352)
           ->  Nested Loop  (cost=171.98..420.07 rows=2 width=20) (actual rows=512 loops=1)
                 Join Filter: ((f1.face < f2.face) AND (f1.edge = f2.edge))
                 Rows Removed by Join Filter: 261632
                 ->  Nested Loop  (cost=80.59..242.23 rows=1 width=8) (actual rows=512 loops=1)
                       Join Filter: ((e5.edge <> f2.edge) AND (e5.edge <> e2.edge) AND (e2.face = e5.face))
                       Rows Removed by Join Filter: 4455936
                       ->  Seq Scan on r e5  (cost=0.00..80.31 rows=22 width=8) (actual rows=4352 loops=1)
                             Filter: (dihedral = 2)
                             Rows Removed by Filter: 256
                       ->  Materialize  (cost=80.59..161.05 rows=2 width=16) (actual rows=1024 loops=4352)
                             ->  Hash Join  (cost=80.59..161.04 rows=2 width=16) (actual rows=1024 loops=1)
                                   Hash Cond: (f2.face = e2.face)
                                   Join Filter: (e2.edge <> f2.edge)
                                   Rows Removed by Join Filter: 512
                                   ->  Seq Scan on r f2  (cost=0.00..80.31 rows=22 width=8) (actual rows=768 loops=1)
                                         Filter: (valency = 3)
                                         Rows Removed by Filter: 3840
                                   ->  Hash  (cost=80.31..80.31 rows=22 width=8) (actual rows=4352 loops=1)
                                         Buckets: 1024  Batches: 1  Memory Usage: 170kB
                                         ->  Seq Scan on r e2  (cost=0.00..80.31 rows=22 width=8) (actual rows=4352 loops=1)
                                               Filter: (dihedral = 2)
                                               Rows Removed by Filter: 256
                 ->  Hash Join  (cost=91.39..177.51 rows=22 width=16) (actual rows=512 loops=512)
                       Hash Cond: (e4.face = f1.face)
                       Join Filter: (e4.edge <> f1.edge)
                       Rows Removed by Join Filter: 256
                       ->  Seq Scan on r e4  (cost=0.00..69.25 rows=4425 width=8) (actual rows=4608 loops=512)
                       ->  Hash  (cost=91.38..91.38 rows=1 width=8) (actual rows=256 loops=1)
                             Buckets: 1024  Batches: 1  Memory Usage: 10kB
                             ->  Seq Scan on r f1  (cost=0.00..91.38 rows=1 width=8) (actual rows=256 loops=1)
                                   Filter: ((valency = 3) AND (dihedral = 1))
                                   Rows Removed by Filter: 4352
 Total runtime: 1262.761 ms
 (41 rows)

user948581 · Accepted Answer

The query planner uses statistics on row counts/index sizes/etc. to estimate how to get the best performance out of a query. A bulk insertion of rows immediately followed by a query may not show best performance, because these statistics may be out of date.

To make sure the planner makes informed choices, you need to issue a call to ANALYZE prior to running your EXPLAIN query.

In your specific scenario, chances are the planner made a bad choice in the first case (the 64 rows) and a good one in the second case (the 128 rows).

Strange pgsql query performance

Answers (1)

Related Questions