Reputation: 1599
I am comparing two tables to make sure they are same row by row and column by column on SQl server.
SELECT *
FROM t1, t2
WHERE t1.column1 = t2.column1 AND t1.column2 = t2.column2
AND t1.column3 = t2.column3 AND t1.column4 != t2.column4
The tables are vey large, more than 100 million.
I got error:
ERROR [HY000] ERROR: 9434 : Not enough memory for merge-style join
Are there better ways to do this comparison.
thanks !
Upvotes: 1
Views: 828
Reputation: 399
All of the above are good suggestions (My first try would be SELECT * FROM t1 EXCEPT SELECT * FROM t2
), but you indicate they all give the same out of memory error. Therefore I must conclude your tables are simply too large to perform the operation you desire all in one go. You'll have to run the query in stages, using a technique like one of the ones from "Equivalent of LIMIT and OFFSET for SQL Server?" I'd start with something like this (SQL Fiddle):
DECLARE @offset INT = 0
SELECT TOP 50000000 *
FROM (
SELECT *,
ROW_NUMBER() over (order by column1) AS r_n_n
FROM t1
) xx
WHERE r_n_n >= @offset
EXCEPT
SELECT TOP 50000000 *
FROM (
SELECT *,
ROW_NUMBER() over (order by column1) AS r_n_n
FROM t2
) xx
WHERE r_n_n >= @offset
Then you can increment @offset
by the amount of TOP n
and do it again. This will likely involve some trial and error to find the limit for the TOP n
clause that will run to completion rather than throw an error. I'd start with half, then try quarters, eighths, etc. as necessary.
Upvotes: 0
Reputation: 18472
Here's how I would do this: first, I assume you have primary keys on both tables. When you join those tables, the best way to join is using primary key fields, not all of them:
select t1.*, t2.*
from t1 join t2 on t1.id = t2.id
then you can compare those tables field-by-field without overloading sql:
select t1.*, t2.*
from t1 outer join t2 on t1.id = t2.id
where t1.field1 <> t2.field1 ot t1.field2 <> t2.field2 .....
the resulting records would be mismatches.
the code I wrote here is conceptual, I personally didn't run it on sql, so you might need to adjust
Upvotes: 0
Reputation: 11983
You could try EXCEPT
http://technet.microsoft.com/en-us/library/ms188055(v=sql.100).aspx
SELECT column1, column2, column3, column4 FROM t1
EXCEPT
SELECT column1, column2, column3, column4 FROM t2
Upvotes: 1
Reputation: 704
What if you try an INNER JOIN (and not select all the data from both tables)?
SELECT t1.column4, t2.column4
FROM t1 INNER JOIN t2 ON t1.column1 = t2.column1 AND t1.column2 = t2.column2
AND t1.column3 = t2.column3
WHERE t1.column4 != t2.column4
Do you want to identify all the rows that are different or just identify IF there are any rows that are different?
Upvotes: 0
Reputation: 69524
A much efficient way of checking the row by row difference will be using Exists operator.
Something like this....
SELECT *
FROM t1
WHERE NOT EXISTS (SELECT 1
FROM t2
WHERE t1.column1 = t2.column1
AND t1.column2 = t2.column2
AND t1.column3 = t2.column3
AND t1.column4 = t2.column4
)
Upvotes: 1