Matt
Matt

Reputation: 20786

Why can't PostgreSQL do this simple FULL JOIN?

Here's a minimal setup with 2 tables a and b each with 3 rows:

CREATE TABLE a (
    id SERIAL PRIMARY KEY,
    value TEXT
);
CREATE INDEX ON a (value);

CREATE TABLE b (
    id SERIAL PRIMARY KEY,
    value TEXT
);
CREATE INDEX ON b (value);

INSERT INTO a (value) VALUES ('x'), ('y'),        (NULL);
INSERT INTO b (value) VALUES        ('y'), ('z'), (NULL);

Here is a LEFT JOIN that works fine as expected:

SELECT * FROM a
LEFT JOIN b ON a.value IS NOT DISTINCT FROM b.value;

with output:

 id | value | id | value 
----+-------+----+-------
  1 | x     |    | 
  2 | y     |  1 | y
  3 |       |  3 | 
(3 rows)

Changing "LEFT JOIN" to "FULL JOIN" gives an error:

SELECT * FROM a
FULL JOIN b ON a.value IS NOT DISTINCT FROM b.value;

ERROR: FULL JOIN is only supported with merge-joinable or hash-joinable join conditions

Can someone please answer:

What is a "merge-joinable or hash-joinable join condition" and why joining on a.value IS NOT DISTINCT FROM b.value doesn't fulfill this condition, but a.value = b.value is perfectly fine?

It seems that the only difference is how NULL values are handled. Since the value column is indexed in both tables, running an EXPLAIN on a NULL lookup is just as efficient as looking up values that are non-NULL:

EXPLAIN SELECT * FROM a WHERE value = 'x';
                                QUERY PLAN                                
--------------------------------------------------------------------------
 Bitmap Heap Scan on a  (cost=4.20..13.67 rows=6 width=36)
   Recheck Cond: (value = 'x'::text)
   ->  Bitmap Index Scan on a_value_idx  (cost=0.00..4.20 rows=6 width=0)
         Index Cond: (value = 'x'::text)


EXPLAIN SELECT * FROM a WHERE value ISNULL;
                                QUERY PLAN                                
--------------------------------------------------------------------------
 Bitmap Heap Scan on a  (cost=4.20..13.65 rows=6 width=36)
   Recheck Cond: (value IS NULL)
   ->  Bitmap Index Scan on a_value_idx  (cost=0.00..4.20 rows=6 width=0)
         Index Cond: (value IS NULL)

This has been tested with PostgreSQL 9.6.3 and 10beta1.

There has been discussion about this issue, but it doesn't directly answer the above question.

Upvotes: 7

Views: 7835

Answers (2)

David
David

Reputation: 5016

I just solved such a case by replacing the ON condition with "TRUE", and moving the original "ON" condition into a WHERE clause. I don't know the performance impact of this, though.

Upvotes: 2

Laurenz Albe
Laurenz Albe

Reputation: 246688

PostgreSQL implements FULL OUTER JOIN with either a hash or a merge join.

To be eligible for such a join, the join condition has to have the form

<expression using only left table> <operator> <expression using only right table>

Now your join condition does look like this, but PostgreSQL does not have a special IS NOT DISTINCT FROM operator, so it parses your condition into:

(NOT ($1 IS DISTINCT FROM $2))

And such an expression cannot be used for hash or merge joins, hence the error message.

I can think of a way to work around it:

SELECT a_id, NULLIF(a_value, '<null>'),
       b_id, NULLIF(b_value, '<null>')
FROM (SELECT id AS a_id,
             COALESCE(value, '<null>') AS a_value
      FROM a
     ) x
   FULL JOIN
     (SELECT id AS b_id,
             COALESCE(value, '<null>') AS b_value
      FROM b
     ) y
      ON x.a_value = y.b_value;

That works if <null> does not appear anywhere in the value columns.

Upvotes: 8

Related Questions