Amr Essam
Amr Essam

Reputation: 36

Why does a lateral joins with LIMIT increase execution time?

when I run a query with a lateral join and a LIMIT inside, it uses nested a loop join. But when I remove the LIMIT it uses a Hash Right Join. Why?

EXPLAIN ANALYSE
SELECT proxy.*
FROM jobs
LEFT  OUTER JOIN LATERAL (
        SELECT proxy.*
        FROM proxy
        WHERE jobs.id = proxy.job_id
) proxy ON true

 

Hash Right Join  (cost=2075.47..3029.05 rows=34688 width=12) (actual time=9.951..24.758 rows=35212 loops=1)
Hash Cond: (proxy.job_id = jobs.id)
  ->  Seq Scan on proxy  (cost=0.00..524.15 rows=34015 width=12) (actual time=0.011..2.502 rows=34028 loops=1)
  ->  Hash  (cost=1641.87..1641.87 rows=34688 width=4) (actual time=9.842..9.842 rows=34689 loops=1)
        Buckets: 65536  Batches: 1  Memory Usage: 1732kB
        ->  Index Only Scan using jobs_pkey on jobs  (cost=0.29..1641.87 rows=34688 width=4) (actual time=0.010..4.904 rows=34689 loops=1)
              Heap Fetches: 921

But when I add limits to the query, the actual time jumps from 24 to 150:

EXPLAIN ANALYSE
SELECT proxy.*
FROM jobs
LEFT OUTER JOIN LATERAL (
        SELECT proxy.*
        FROM proxy
        WHERE jobs.id = proxy.job_id
        limit 1
) proxy ON true

 

Nested Loop Left Join  (cost=0.58..290506.19 rows=34688 width=12) (actual time=0.024..155.753 rows=34689 loops=1)
  ->  Index Only Scan using jobs_pkey on jobs  (cost=0.29..1641.87 rows=34688 width=4) (actual time=0.014..3.984 rows=34689 loops=1)
        Heap Fetches: 921
  ->  Limit  (cost=0.29..8.31 rows=1 width=12) (actual time=0.001..0.001 rows=1 loops=34689)
        ->  Index Scan using index_job_proxy_on_job_id on loc_job_source_materials  (cost=0.29..8.31 rows=1 width=12) (actual time=0.001..0.001 rows=1 loops=34689)
              Index Cond: (jobs.id = job_id)

Upvotes: 0

Views: 609

Answers (2)

Andrew Lazarus
Andrew Lazarus

Reputation: 19350

Following on from @LaurenzAlbe's answer, I think we can help more if you show the complete query, so we know why you need a LATERAL join. For the (simplified) requirements you have mentioned so far, I think an equivalent is

SELECT DISTINCT ON(proxy.id) proxy.*
FROM proxy
   RIGHT OUTER JOIN jobs
   ON jobs.id = proxy.job_id;

Also, since you are only outputting columns from proxy, you are effectively doing only an INNER JOIN, but with more computing effort.

Upvotes: 0

Laurenz Albe
Laurenz Albe

Reputation: 247625

The optimizer is smart enough to rewrite your first query to

SELECT proxy.*
FROM proxy
   RIGHT OUTER JOIN jobs
      ON jobs.id = proxy.job_id;

But this optimization cannot be made with the LIMIT clause, so only a nested loop join is possible.

Upvotes: 2

Related Questions