Reputation: 20786
Here's a minimal setup with 2 tables a
and b
each with 3 rows:
CREATE TABLE a (
id SERIAL PRIMARY KEY,
value TEXT
);
CREATE INDEX ON a (value);
CREATE TABLE b (
id SERIAL PRIMARY KEY,
value TEXT
);
CREATE INDEX ON b (value);
INSERT INTO a (value) VALUES ('x'), ('y'), (NULL);
INSERT INTO b (value) VALUES ('y'), ('z'), (NULL);
Here is a LEFT JOIN that works fine as expected:
SELECT * FROM a
LEFT JOIN b ON a.value IS NOT DISTINCT FROM b.value;
with output:
id | value | id | value
----+-------+----+-------
1 | x | |
2 | y | 1 | y
3 | | 3 |
(3 rows)
Changing "LEFT JOIN" to "FULL JOIN" gives an error:
SELECT * FROM a
FULL JOIN b ON a.value IS NOT DISTINCT FROM b.value;
ERROR: FULL JOIN is only supported with merge-joinable or hash-joinable join conditions
Can someone please answer:
What is a "merge-joinable or hash-joinable join condition" and why joining on a.value IS NOT DISTINCT FROM b.value
doesn't fulfill this condition, but a.value = b.value
is perfectly fine?
It seems that the only difference is how NULL values are handled. Since the value
column is indexed in both tables, running an EXPLAIN
on a NULL
lookup is just as efficient as looking up values that are non-NULL
:
EXPLAIN SELECT * FROM a WHERE value = 'x';
QUERY PLAN
--------------------------------------------------------------------------
Bitmap Heap Scan on a (cost=4.20..13.67 rows=6 width=36)
Recheck Cond: (value = 'x'::text)
-> Bitmap Index Scan on a_value_idx (cost=0.00..4.20 rows=6 width=0)
Index Cond: (value = 'x'::text)
EXPLAIN SELECT * FROM a WHERE value ISNULL;
QUERY PLAN
--------------------------------------------------------------------------
Bitmap Heap Scan on a (cost=4.20..13.65 rows=6 width=36)
Recheck Cond: (value IS NULL)
-> Bitmap Index Scan on a_value_idx (cost=0.00..4.20 rows=6 width=0)
Index Cond: (value IS NULL)
This has been tested with PostgreSQL 9.6.3 and 10beta1.
There has been discussion about this issue, but it doesn't directly answer the above question.
Upvotes: 7
Views: 7835
Reputation: 5016
I just solved such a case by replacing the ON condition with "TRUE", and moving the original "ON" condition into a WHERE clause. I don't know the performance impact of this, though.
Upvotes: 2
Reputation: 246688
PostgreSQL implements FULL OUTER JOIN
with either a hash or a merge join.
To be eligible for such a join, the join condition has to have the form
<expression using only left table> <operator> <expression using only right table>
Now your join condition does look like this, but PostgreSQL does not have a special IS NOT DISTINCT FROM
operator, so it parses your condition into:
(NOT ($1 IS DISTINCT FROM $2))
And such an expression cannot be used for hash or merge joins, hence the error message.
I can think of a way to work around it:
SELECT a_id, NULLIF(a_value, '<null>'),
b_id, NULLIF(b_value, '<null>')
FROM (SELECT id AS a_id,
COALESCE(value, '<null>') AS a_value
FROM a
) x
FULL JOIN
(SELECT id AS b_id,
COALESCE(value, '<null>') AS b_value
FROM b
) y
ON x.a_value = y.b_value;
That works if <null>
does not appear anywhere in the value
columns.
Upvotes: 8