Somu Sinhhaa
Somu Sinhhaa

Reputation: 163

Avoid using Nested Loop Join while using a non Equi join condition

Postgres is using a Nested Loop Join algorithm when I use a non equi join condition in my update query. I understand that the Nested Loop Join can be very costly as the right relation is scanned once for every row found in the left relation as per [https://www.postgresql.org/docs/8.3/planner-optimizer.html]

The update query and the execution plan is below.

Query

explain analyze
UPDATE target_tbl tgt
set descr     = stage.descr,
    prod_name = stage.prod_name,
    item_name = stage.item_name,
    url       = stage.url,
    col1_name = stage.col1_name,
    col2_name = stage.col2_name,
    col3_name = stage.col3_name,
    col4_name = stage.col4_name,
    col5_name = stage.col5_name,
    col6_name = stage.col6_name,
    col7_name = stage.col7_name,
    col8_name = stage.col8_name,
    flag      = stage.flag
from tbl1 stage
where tgt.col1 = stage.col1
  and tgt.col2 = stage.col2
  and coalesce(tgt.col3, 'col3'::text) = coalesce(stage.col3, 'col3'::text)
  and coalesce(tgt.col4, 'col4'::text) = coalesce(stage.col4, 'col4'::text)
  and stage.row_number::int >= 1::int
  and stage.row_number::int < 50001::int;

Execution Plan

Update on target_tbl tgt  (cost=0.56..3557.91 rows=1 width=813) (actual time=346153.460..346153.460 rows=0 loops=1)
  ->  Nested Loop  (cost=0.56..3557.91 rows=1 width=813) (actual time=4.326..163876.029 rows=50000 loops=1)
        ->  Seq Scan on tbl1 stage  (cost=0.00..2680.96 rows=102 width=759) (actual time=3.060..2588.745 rows=50000 loops=1)
              Filter: (((row_number)::integer >= 1) AND ((row_number)::integer < 50001))
        ->  Index Scan using tbl_idx on target_tbl tgt  (cost=0.56..8.59 rows=1 width=134) (actual time=3.152..3.212 rows=1 loops=50000)
              Index Cond: ((col1 = stage.col1) AND (col2 = stage.col2) AND (COALESCE(col3, 'col3'::text) = COALESCE(stage.col3, 'col3'::text)) AND (COALESCE(col4, 'col4'::text) = COALESCE(stage.col4, 'col4'::text)))
Planning time: 17.700 ms
Execution time: 346157.168 ms
  1. Is there any way to avoid the nested loop join during the execution of the above query?

  2. Or is there a way that can help me to reduce the cost of the the nested loop scan, currently it takes 6-7 minutes to update just 50000 records?

Upvotes: 1

Views: 633

Answers (2)

jjanes
jjanes

Reputation: 44363

I understand that the Nested Loop Join can be very costly as the right relation is scanned once for every row found in the left relation

But the "right relation" here is an index scan, not a scan of the full table.

You can get it to stop using the index by changing the leading column of the join condition to something like where tgt.col1+0 = stage.col1 .... Upon doing that, it will probably change to a hash join or a merge join, but you will have to try it and see if it does. Also, the new plan may not actually be faster. (And fixing the estimation problem would be preferable, if that works)

Or is there a way that can help me to reduce the cost of the the nested loop scan, currently it takes 6-7 minutes to update just 50000 records?

Your plan shows that over half the time is spent on the update itself, so probably reducing the cost of just the nested loop scan can have only a muted impact on the overall time. Do you have a lot of indexes on the table? The maintenance of those indexes might be a major bottleneck.

Upvotes: 1

Laurenz Albe
Laurenz Albe

Reputation: 247625

PostgreSQL can choose a different join strategy in that case. The reason why it doesn't is the gross mis-estimate in the sequential scan: 102 instead of 50000.

Fix that problem, and things will get better:

ANALYZE tbl1;

If that is not enough, collect more detailed statistics:

ALTER TABLE tbl1 ALTER row_number SET STATISTICS 1000;
ANALYZE tbl1;

All this assumes that row_number is an integer and the type cast is redundant. If you made the mistake to use a different data type, an index is your only hope:

CREATE INDEX ON tbl1 ((row_number::integer));
ANALYZE tbl1;

Upvotes: 1

Related Questions