Reputation: 108
I want to compare the contents of two tables in SQL (postgres). The tables have a matching schema. As output I want those rows that are different. I already checked various approaches (also some posted here on Stackoverflow), but to me it looks like I have a problem when fields in both columns are NULL. Here is what I came up with so far:
SELECT dipi.id AS dipi_id,dipi.foo AS dipi_foo,dipi.bar AS dipi_bar, rvdii.id AS rvdii_id,rvdii.foo AS rvdii_foo,rvdii.bar AS rvdii_bar
FROM schema1.mytable dipi
FULL OUTER JOIN schema2.mytable rvdii on dipi.id = rvdii.id AND dipi.foo = rvdii.foo AND dipi.bar = rvdii.bar
WHERE dipi.id is null or rvdii.id is null
So do a full outer join on all columns and then return those where one of the IDs is null.
But with this query I get back this with some sample data:
dipi_id dipi_foo dipi_bar rvdii_id rvdii_foo rvdii_bar 3 2016-01-01 NULL NULL NULL NULL NULL NULL NULL 3 2016-01-01 NULL
As you can see the "dipi_bar", "rvdii_bar" columns contains a null value, which don't match in a join (as I understood this). So in the case above I don't want any output, I want that when both columns/fields are NULL, that they get "matched" in the join (or ignored in the join, whatever works). What is the trick to do this?
Edit: Of course the column is not always null, so deleting from the join is not an option :)
Upvotes: 1
Views: 1669
Reputation: 108
Answer to my own question: I ended up using the approach from https://stackoverflow.com/a/18736060/1368432 which uses UNION
, EXCEPT
and INTERSECT
to get the correct result set. For my case this turned out to be the easiest approach.
Upvotes: 1
Reputation: 1271231
Use the NULL
-safe comparison operator, is not distinct from
. Note that the WHERE
clause needs to change as well. Postgres does not have an xor operator, but you can do:
SELECT dipi.id AS dipi_id,dipi.foo AS dipi_foo,dipi.bar AS dipi_bar,
rvdii.id AS rvdii_id,rvdii.foo AS rvdii_foo,rvdii.bar AS rvdii_bar
FROM schema1.mytable dipi FULL OUTER JOIN
schema2.mytable rvdii
ON dipi.id IS NOT DISTINCT FROM rvdii.id AND
dipi.foo IS NOT DISTINCT rvdii.foo AND
dipi.bar IS NOT DISTINCT FROM rvdii.bar
WHERE dipi.id IS DISTINCT FROM rvdii.id OR
dipi.foo IS DISTINCT FROM rvdii.foo OR
dipi.bar IS DISTINCT FROM rvdii.bar;
Because the code allows for NULL
matches in the ON
, you cannot check for NULL
for the WHERE
filtering condition.
Upvotes: 0
Reputation: 1133
I take that you want to see the rows that having the same id in both tables, have differences on the other two fields. If that's the case I think that the following would do the job:
SELECT dipi.id AS dipi_id,dipi.foo AS dipi_foo,dipi.bar AS dipi_bar, rvdii.id AS rvdii_id,rvdii.foo AS rvdii_foo,rvdii.bar AS rvdii_bar
FROM schema1.mytable dipi
INNER JOIN schema2.mytable rvdii on dipi.id = rvdii.id
WHERE (dipi.foo <> rvdii.foo OR (dipi.foo IS NULL AND rvdii.foo IS NOT NULL) OR (dipi.foo IS NOT NULL AND rvdii.foo IS NULL))
OR (dipi.bar <> rvdii.bar OR (dipi.bar IS NULL AND rvdii.bar IS NOT NULL) OR (dipi.bar IS NOT NULL AND rvdii.bar IS NULL))
Upvotes: 1