user3448011
user3448011

Reputation: 1599

comparing two tables to make sure they are same row by row and column by column on SQl server

I am comparing two tables to make sure they are same row by row and column by column on SQl server.

      SELECT  *
        FROM t1, t2
        WHERE t1.column1 = t2.column1 AND t1.column2 = t2.column2
      AND t1.column3 = t2.column3 AND t1.column4 != t2.column4

The tables are vey large, more than 100 million.

I got error:

    ERROR [HY000] ERROR:  9434 : Not enough memory for merge-style join

Are there better ways to do this comparison.

thanks !

Upvotes: 1

Views: 828

Answers (5)

ob1quixote
ob1quixote

Reputation: 399

All of the above are good suggestions (My first try would be SELECT * FROM t1 EXCEPT SELECT * FROM t2), but you indicate they all give the same out of memory error. Therefore I must conclude your tables are simply too large to perform the operation you desire all in one go. You'll have to run the query in stages, using a technique like one of the ones from "Equivalent of LIMIT and OFFSET for SQL Server?" I'd start with something like this (SQL Fiddle):

DECLARE @offset INT = 0
SELECT TOP 50000000 * 
  FROM (
        SELECT *, 
               ROW_NUMBER() over (order by column1) AS r_n_n 
          FROM t1
        ) xx 
  WHERE r_n_n >= @offset
EXCEPT
 SELECT TOP 50000000 * 
   FROM (
         SELECT *, 
                ROW_NUMBER() over (order by column1) AS r_n_n 
           FROM t2
        ) xx 
  WHERE r_n_n >= @offset

Then you can increment @offset by the amount of TOP n and do it again. This will likely involve some trial and error to find the limit for the TOP n clause that will run to completion rather than throw an error. I'd start with half, then try quarters, eighths, etc. as necessary.

Upvotes: 0

galets
galets

Reputation: 18472

Here's how I would do this: first, I assume you have primary keys on both tables. When you join those tables, the best way to join is using primary key fields, not all of them:

select t1.*, t2.*
from t1 join t2 on t1.id = t2.id

then you can compare those tables field-by-field without overloading sql:

select t1.*, t2.*
from t1 outer join t2 on t1.id = t2.id
where t1.field1 <> t2.field1 ot t1.field2 <> t2.field2 .....

the resulting records would be mismatches.

the code I wrote here is conceptual, I personally didn't run it on sql, so you might need to adjust

Upvotes: 0

Malk
Malk

Reputation: 11983

You could try EXCEPT http://technet.microsoft.com/en-us/library/ms188055(v=sql.100).aspx

SELECT column1, column2, column3, column4 FROM t1
EXCEPT
SELECT column1, column2, column3, column4 FROM t2

Upvotes: 1

Dave Jemison
Dave Jemison

Reputation: 704

What if you try an INNER JOIN (and not select all the data from both tables)?

SELECT  t1.column4, t2.column4
FROM t1 INNER JOIN t2 ON t1.column1 = t2.column1 AND t1.column2 = t2.column2
  AND t1.column3 = t2.column3
WHERE t1.column4 != t2.column4

Do you want to identify all the rows that are different or just identify IF there are any rows that are different?

Upvotes: 0

M.Ali
M.Ali

Reputation: 69524

A much efficient way of checking the row by row difference will be using Exists operator.

Something like this....

SELECT  *
FROM t1
WHERE NOT EXISTS (SELECT 1
                  FROM  t2 
                  WHERE t1.column1 = t2.column1 
                    AND t1.column2 = t2.column2
                    AND t1.column3 = t2.column3 
                    AND t1.column4 = t2.column4
                  )

Upvotes: 1

Related Questions