Dominic Bou-Samra
Dominic Bou-Samra

Reputation: 15416

JOIN is MUCH slower than a UNION, even with indexes

I have a query:

SELECT study."id"
FROM study
JOIN report ON (report."studyId" = study."id")
WHERE 
study.facts->'patientName'->>'value' = 'HELLO WORLD' OR 
report.variables->'patientName'->>'value' = 'HELLO WORLD'

All tables have indexes.

Why is this query taking 4.5s across 6000 rows? Explain analyze output below:

+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN                                                                                                                                                                                 |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Hash Join  (cost=383.69..1403.67 rows=39 width=52) (actual time=2734.257..2734.260 rows=0 loops=1)                                                                                         |
|   Hash Cond: ((study.id)::text = (report."studyId")::text)                                                                                                                                 |
|   Join Filter: ((((study.facts -> 'patientName'::text) ->> 'value'::text) = 'HELLO WORLD'::text) OR (((report.variables -> 'patientName'::text) ->> 'value'::text) = 'HELLO WORLD'::text)) |
|   Rows Removed by Join Filter: 7453                                                                                                                                                        |
|   ->  Seq Scan on study  (cost=0.00..1000.23 rows=7523 width=70) (actual time=0.020..13.548 rows=7523 loops=1)                                                                             |
|   ->  Hash  (cost=290.53..290.53 rows=7453 width=70) (actual time=5.052..5.053 rows=7453 loops=1)                                                                                          |
|         Buckets: 8192  Batches: 1  Memory Usage: 808kB                                                                                                                                     |
|         ->  Seq Scan on report  (cost=0.00..290.53 rows=7453 width=70) (actual time=0.014..3.235 rows=7453 loops=1)                                                                        |
| Planning Time: 0.896 ms                                                                                                                                                                    |
| Execution Time: 2734.323 ms                                                                                                                                                                |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I have a UNION query that does the same thing, but is much quicker (0.001s). I want to know more about why my JOIN query is so much slower:

SELECT id::text
FROM study
WHERE study.facts->'patientName'->>'value' = 'HELLO WORLD'
UNION
SELECT report."studyId"::text
FROM report
WHERE report.variables->'patientName'->>'value' = 'HELLO WORLD';
+-------------------------------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN                                                                                                                                      |
|-------------------------------------------------------------------------------------------------------------------------------------------------|
| HashAggregate  (cost=143.12..143.51 rows=39 width=32) (actual time=0.040..0.041 rows=0 loops=1)                                                 |
|   Group Key: ((study.id)::text)                                                                                                                 |
|   ->  Append  (cost=4.58..143.02 rows=39 width=32) (actual time=0.038..0.039 rows=0 loops=1)                                                    |
|         ->  Bitmap Heap Scan on study  (cost=4.58..134.14 rows=38 width=32) (actual time=0.026..0.026 rows=0 loops=1)                           |
|               Recheck Cond: (((facts -> 'patientName'::text) ->> 'value'::text) = 'HELLO WORLD'::text)                                          |
|               ->  Bitmap Index Scan on "IDX_facts_patientName"  (cost=0.00..4.57 rows=38 width=0) (actual time=0.023..0.023 rows=0 loops=1)     |
|                     Index Cond: (((facts -> 'patientName'::text) ->> 'value'::text) = 'HELLO WORLD'::text)                                      |
|         ->  Index Scan using "IDX_variables_patientName" on report  (cost=0.28..8.30 rows=1 width=32) (actual time=0.012..0.012 rows=0 loops=1) |
|               Index Cond: (((variables -> 'patientName'::text) ->> 'value'::text) = 'HELLO WORLD'::text)                                        |
| Planning Time: 0.560 ms                                                                                                                         |
| Execution Time: 0.103 ms                                                                                                                        |
+-------------------------------------------------------------------------------------------------------------------------------------------------+

Upvotes: 1

Views: 87

Answers (2)

wildplasser
wildplasser

Reputation: 44240

OR is always difficult for the optimiser(s). Since you dont need any fields from the report table, you could tuck that away into an EXISTS() subquery, this will probably result in a bitmap-index scan.:


SELECT s."id"
FROM study s
WHERE s.facts->'patientName'->>'value' = 'HELLO WORLD'
OR  EXISTS (
        SELECT * FROM report r
        WHERE r."studyId" = s."id"
        AND r.variables->'patientName'->>'value' = 'HELLO WORLD'
        );

Upvotes: 2

Rui Costa
Rui Costa

Reputation: 437

JOINS and UNIONS are two completely different operations.

  • JOIN adds columns from both tables to the result set, matching both tables with you matching criteria ( WHERE CLAUSE )

  • UNION on the others hand, append the result of 1 table on top of the other. In more detail, the difference between an UNION and a UNION ALL, is that the union is a SELECT DISTINCT from the UNION ALL

In summary, the UNION does not require a matching criteria to select the ROWS to add to the result.

Upvotes: 2

Related Questions