JOIN is MUCH slower than a UNION, even with indexes

Question

I have a query:

SELECT study."id"
FROM study
JOIN report ON (report."studyId" = study."id")
WHERE 
study.facts->'patientName'->>'value' = 'HELLO WORLD' OR 
report.variables->'patientName'->>'value' = 'HELLO WORLD'

All tables have indexes.

Why is this query taking 4.5s across 6000 rows? Explain analyze output below:

+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN                                                                                                                                                                                 |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Hash Join  (cost=383.69..1403.67 rows=39 width=52) (actual time=2734.257..2734.260 rows=0 loops=1)                                                                                         |
|   Hash Cond: ((study.id)::text = (report."studyId")::text)                                                                                                                                 |
|   Join Filter: ((((study.facts -> 'patientName'::text) ->> 'value'::text) = 'HELLO WORLD'::text) OR (((report.variables -> 'patientName'::text) ->> 'value'::text) = 'HELLO WORLD'::text)) |
|   Rows Removed by Join Filter: 7453                                                                                                                                                        |
|   ->  Seq Scan on study  (cost=0.00..1000.23 rows=7523 width=70) (actual time=0.020..13.548 rows=7523 loops=1)                                                                             |
|   ->  Hash  (cost=290.53..290.53 rows=7453 width=70) (actual time=5.052..5.053 rows=7453 loops=1)                                                                                          |
|         Buckets: 8192  Batches: 1  Memory Usage: 808kB                                                                                                                                     |
|         ->  Seq Scan on report  (cost=0.00..290.53 rows=7453 width=70) (actual time=0.014..3.235 rows=7453 loops=1)                                                                        |
| Planning Time: 0.896 ms                                                                                                                                                                    |
| Execution Time: 2734.323 ms                                                                                                                                                                |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I have a UNION query that does the same thing, but is much quicker (0.001s). I want to know more about why my JOIN query is so much slower:

SELECT id::text
FROM study
WHERE study.facts->'patientName'->>'value' = 'HELLO WORLD'
UNION
SELECT report."studyId"::text
FROM report
WHERE report.variables->'patientName'->>'value' = 'HELLO WORLD';

+-------------------------------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN                                                                                                                                      |
|-------------------------------------------------------------------------------------------------------------------------------------------------|
| HashAggregate  (cost=143.12..143.51 rows=39 width=32) (actual time=0.040..0.041 rows=0 loops=1)                                                 |
|   Group Key: ((study.id)::text)                                                                                                                 |
|   ->  Append  (cost=4.58..143.02 rows=39 width=32) (actual time=0.038..0.039 rows=0 loops=1)                                                    |
|         ->  Bitmap Heap Scan on study  (cost=4.58..134.14 rows=38 width=32) (actual time=0.026..0.026 rows=0 loops=1)                           |
|               Recheck Cond: (((facts -> 'patientName'::text) ->> 'value'::text) = 'HELLO WORLD'::text)                                          |
|               ->  Bitmap Index Scan on "IDX_facts_patientName"  (cost=0.00..4.57 rows=38 width=0) (actual time=0.023..0.023 rows=0 loops=1)     |
|                     Index Cond: (((facts -> 'patientName'::text) ->> 'value'::text) = 'HELLO WORLD'::text)                                      |
|         ->  Index Scan using "IDX_variables_patientName" on report  (cost=0.28..8.30 rows=1 width=32) (actual time=0.012..0.012 rows=0 loops=1) |
|               Index Cond: (((variables -> 'patientName'::text) ->> 'value'::text) = 'HELLO WORLD'::text)                                        |
| Planning Time: 0.560 ms                                                                                                                         |
| Execution Time: 0.103 ms                                                                                                                        |
+-------------------------------------------------------------------------------------------------------------------------------------------------+

wildplasser · Accepted Answer

OR is always difficult for the optimiser(s). Since you dont need any fields from the report table, you could tuck that away into an EXISTS() subquery, this will probably result in a bitmap-index scan.:

SELECT s."id"
FROM study s
WHERE s.facts->'patientName'->>'value' = 'HELLO WORLD'
OR  EXISTS (
        SELECT * FROM report r
        WHERE r."studyId" = s."id"
        AND r.variables->'patientName'->>'value' = 'HELLO WORLD'
        );

JOIN is MUCH slower than a UNION, even with indexes

Answers (2)

Related Questions