Frank
Frank

Reputation: 108

Compare two tables with SQL while ignoring null columns

I want to compare the contents of two tables in SQL (postgres). The tables have a matching schema. As output I want those rows that are different. I already checked various approaches (also some posted here on Stackoverflow), but to me it looks like I have a problem when fields in both columns are NULL. Here is what I came up with so far:

SELECT dipi.id AS dipi_id,dipi.foo AS dipi_foo,dipi.bar AS dipi_bar, rvdii.id AS rvdii_id,rvdii.foo AS rvdii_foo,rvdii.bar AS rvdii_bar
FROM schema1.mytable dipi
FULL OUTER JOIN schema2.mytable rvdii on dipi.id = rvdii.id AND dipi.foo = rvdii.foo AND dipi.bar = rvdii.bar
WHERE dipi.id is null or rvdii.id is null

So do a full outer join on all columns and then return those where one of the IDs is null.

But with this query I get back this with some sample data:

 
 dipi_id     dipi_foo     dipi_bar    rvdii_id     rvdii_foo   rvdii_bar   
 3           2016-01-01   NULL        NULL         NULL        NULL
 NULL        NULL         NULL        3            2016-01-01  NULL

As you can see the "dipi_bar", "rvdii_bar" columns contains a null value, which don't match in a join (as I understood this). So in the case above I don't want any output, I want that when both columns/fields are NULL, that they get "matched" in the join (or ignored in the join, whatever works). What is the trick to do this?

Edit: Of course the column is not always null, so deleting from the join is not an option :)

Upvotes: 1

Views: 1669

Answers (3)

Frank
Frank

Reputation: 108

Answer to my own question: I ended up using the approach from https://stackoverflow.com/a/18736060/1368432 which uses UNION, EXCEPT and INTERSECT to get the correct result set. For my case this turned out to be the easiest approach.

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1271231

Use the NULL-safe comparison operator, is not distinct from. Note that the WHERE clause needs to change as well. Postgres does not have an xor operator, but you can do:

SELECT dipi.id AS dipi_id,dipi.foo AS dipi_foo,dipi.bar AS dipi_bar,
       rvdii.id AS rvdii_id,rvdii.foo AS rvdii_foo,rvdii.bar AS rvdii_bar
FROM schema1.mytable dipi FULL OUTER JOIN
     schema2.mytable rvdii 
     ON dipi.id IS NOT DISTINCT FROM rvdii.id AND
        dipi.foo IS NOT DISTINCT rvdii.foo AND
        dipi.bar IS NOT DISTINCT FROM rvdii.bar
WHERE dipi.id IS DISTINCT FROM rvdii.id OR
      dipi.foo IS DISTINCT FROM rvdii.foo OR
      dipi.bar IS DISTINCT FROM rvdii.bar;

Because the code allows for NULL matches in the ON, you cannot check for NULL for the WHERE filtering condition.

Upvotes: 0

Jorge Y.
Jorge Y.

Reputation: 1133

I take that you want to see the rows that having the same id in both tables, have differences on the other two fields. If that's the case I think that the following would do the job:

SELECT dipi.id AS dipi_id,dipi.foo AS dipi_foo,dipi.bar AS dipi_bar, rvdii.id AS rvdii_id,rvdii.foo AS rvdii_foo,rvdii.bar AS rvdii_bar
FROM schema1.mytable dipi
INNER JOIN schema2.mytable rvdii on dipi.id = rvdii.id 
WHERE (dipi.foo <> rvdii.foo OR (dipi.foo IS NULL AND rvdii.foo IS NOT NULL) OR (dipi.foo IS NOT NULL AND rvdii.foo IS NULL))
OR (dipi.bar <> rvdii.bar OR (dipi.bar IS NULL AND rvdii.bar IS NOT NULL) OR (dipi.bar IS NOT NULL AND rvdii.bar IS NULL))

Upvotes: 1

Related Questions