Zanko
Zanko

Reputation: 4694

Subquery is very slow when add another column query

SELECT a1.object_id,
       (SELECT COALESCE(json_agg(b1), '[]')
        FROM (
                 SELECT c1.root_id,
                        (SELECT COALESCE(json_agg(d1), '[]')
                         FROM (
                                  SELECT e1.root_id,
                                         (SELECT COALESCE(json_agg(f1), '[]')
                                          FROM (
                                                   SELECT g1.root_id,
                                                          (SELECT SUM(h2.amount)
                                                           FROM table_5 AS h1
                                                                    LEFT OUTER JOIN table_6 AS h2 on h1.hash_id = h2.hash_id
                                                           WHERE g1.object_id = h1.root_id
                                                             AND h2.mode = 'Real'
                                                             AND h2.type = 'Reject'
                                                          ) AS amount
                                                   FROM table_4 AS g1
                                                            LEFT OUTER JOIN table_5 AS g2
                                                                            ON g1.general_id = g2.object_id
                                                            LEFT OUTER JOIN table_6 AS g3
                                                                            ON g1.properties_hash_id = g3.hash_id
                                                   WHERE e1.object_id = g1.root_id
                                               ) AS f1) AS tickets
                                  FROM table_3 AS e1
                                  WHERE c1.object_id = e1.root_id
                              ) AS d1) AS blocks
                 FROM table_2 AS c1
                 WHERE a1.object_id = c1.root_id
             ) AS b1) AS sources
FROM table_1 AS a1
...
SubPlan 1
->  Aggregate  (cost=22.77..22.78 rows=1 width=32) (actual time=0.296..0.296 rows=1 loops=16502)
      ->  Nested Loop  (cost=10.72..22.76 rows=1 width=6) (actual time=0.184..0.295 rows=1 loops=16502)
            ->  Index Scan using t041_type_idx on table_6 h2  (cost=0.15..8.17 rows=1 width=10) (actual time=0.001..0.007 rows=20 loops=16502)
                  Index Cond: (type = 'Reject'::valid_optimized_segment_types)
                  Filter: (mode = 'Real'::valid_optimized_segment_modes)
                  Rows Removed by Filter: 20
            ->  Bitmap Heap Scan on table_5 h1  (cost=10.57..14.58 rows=1 width=4) (actual time=0.012..0.012 rows=0 loops=330040)
                  Recheck Cond: ((g1.object_id = root_id) AND (hash_id = h2.hash_id))
                  Heap Blocks: exact=12379
                  ->  BitmapAnd  (cost=10.57..10.57 rows=1 width=0) (actual time=0.011..0.011 rows=0 loops=330040)
                        ->  Bitmap Index Scan on t031_root_id_fkey  (cost=0.00..4.32 rows=3 width=0) (actual time=0.001..0.001 rows=2 loops=330040)
                              Index Cond: (root_id = g1.object_id)
                        ->  Bitmap Index Scan on t031_md5_id_idx  (cost=0.00..6.00 rows=228 width=0) (actual time=0.010..0.010 rows=619 loops=330040)
                              Index Cond: (hash_id = h2.hash_id)
Planning Time: 0.894 ms
Execution Time: 4925.776 ms
SubPlan 1
            ->  Aggregate  (cost=25.77..25.78 rows=1 width=32) (actual time=0.045..0.046 rows=1 loops=16502)
                  ->  Hash Join  (cost=15.32..25.77 rows=2 width=6) (actual time=0.019..0.041 rows=1 loops=16502)
                        Hash Cond: (h2.hash_id = h1.hash_id)
                        ->  Seq Scan on table_6 h2  (cost=0.00..8.56 rows=298 width=10) (actual time=0.001..0.026 rows=285 loops=16502)
                              Filter: (mode = 'Real'::valid_optimized_segment_modes)
                              Rows Removed by Filter: 80
                        ->  Hash  (cost=15.28..15.28 rows=3 width=4) (actual time=0.002..0.002 rows=2 loops=16502)
                              Buckets: 1024  Batches: 1  Memory Usage: 9kB
                              ->  Index Scan using t031_root_id_fkey on table_5 h1  (cost=0.29..15.28 rows=3 width=4) (actual time=0.001..0.002 rows=2 loops=16502)
                                    Index Cond: (root_id = g1.object_id)
Planning Time: 0.889 ms
Execution Time: 787.264 ms
  1. Why is this single line causing so much differences (h2.type is BTREE indexed)?
  2. How else can I achieve the same result with better efficiency?

Upvotes: 0

Views: 146

Answers (1)

jjanes
jjanes

Reputation: 44323

->  Index Scan using t041_type_idx on table_6 h2  (cost=0.15..8.17 rows=1 width=10) (actual time=0.001..0.007 rows=20 loops=16502)

It thinks there will be 1 row, and there are 20. Meaning the next node is executed 20 times more than it thought it would be. If it knew here would be 20, it might have chosen a different plan.

You could try to create cross-column statistics so it can get a better estimate for that row count.

Or you could just speed up the next node, by adding the multi column index on table_5 (root_id,hash_id). It would still get executed far more times than the planner thinks it will, but each execution will be faster.

Or you might just force it into a different plan by making one of those indexes unusable:

...JOIN table_6 AS h2 on h1.hash_id+0 = h2.hash_id

Upvotes: 1

Related Questions