Reputation: 13
I am dealing with some excel files that I need to find the 'differences' between. I started by creating 2 SQLite tables with them, programatically. There are two files that have the same columns, and have matching rows. The problem is that the rows are not necessarily in order, and there is no one key column that could match the rows. There are, however, four columns that I can join on at the same time.
My current solution to find the differences between the two tables is this query:
SELECT `field1`, `field2`, etc FROM table1
WHERE NOT EXISTS (SELECT 1 FROM table2
WHERE table2.`SID` = table1.`SID`
AND table2.`BID` = table1.`BID`
AND table2.`OID` = table1.`OID`
AND table2.`LID` = table1.`LID`);
I'd like the result to be the fields from table1 AND the fields from table2 for each record, so I end up with double the columns.
field1 | field2 | field3 |
---|---|---|
abc | def | ghi |
jkl | mno | pqr |
field1 | field2 | field3 |
---|---|---|
abc | def | AAA |
jkl | mno | pqr |
to one table with all columns showing only differing rows:
t1_field1 | t1_field2 | t1_field3 | t2_field1 | t2_field2 | t2_field3 |
---|---|---|---|---|---|
abc | def | ghi | abc | def | AAA |
I think a better approach than some complex sql is to retroactively create a foreign key for both of the tables so that making a query like that would be much simpler.
Remember that I create these tables and insert data programatically, so if the solution lies during the creation, I can still implement your suggestions.
EDIT: As @marcos pointed out, a foreign key would be impossible due to the matching row not being guaranteed to exist. A compound join sounds like a good path.
My intention is to find any differences at all, including the deletion of a row, from table1 to table2. I don't need the differences from 2 to 1.
This query @marcos recommended (minus the t2 - t1 union) almost works perfectly:
SELECT t1.field1, t1.field2, ..., t2.field1, t2.field2
FROM table1 t1
LEFT JOIN table2 t2
ON t1.SID = t2.SID
AND t1.BID = t2.BID
AND t1.OID = t2.OID
AND t1.LID = t2.LID
WHERE t2.SID IS NULL
returning something like:
t1_field1 | t1_field2 | t1_field3 | t2_field1 | t2_field2 | t2_field3 |
---|---|---|---|---|---|
abc | def | ghi | --- | --- | --- |
But unfortunately the columns from table2 do not populate. I'm also unsure of how the WHERE t2.SID IS NULL
works here, since SID is not a unique key.
Thanks in advance! :)
Upvotes: 0
Views: 190
Reputation: 153
If I understood correctly, you want all the lines in table1 that does not match table2 plus all the lines in table2 that does not match t1.
SELECT t1.field1, t1.field2, ..., t2.field1, t2.field2
FROM table1 t1
LEFT JOIN table2 t2
ON t1.SID = t2.SID
AND t1.BID = t2.BID
AND t1.OID = t2.OID
AND t1.LID = t2.LID
WHERE t2.SID IS NULL
UNION ALL
SELECT t1.field1, t1.field2, ..., t2.field1, t2.field2
FROM table2 t2
LEFT JOIN table1 t1
ON t1.SID = t2.SID
AND t1.BID = t2.BID
AND t1.OID = t2.OID
AND t1.LID = t2.LID
WHERE t1.SID IS NULL
Create an index on those four columns (SID, BID, OID and LID) to improve performance.
Foreign key is not possible, because they enforce that the data on one table is present in another table. You will get a fk constraint error if you try to create one.
Upvotes: 1