sqlsql-serversql-server-2008sql-server-2005

Reputation:

sql query to return differences between two tables

I am trying to compare two tables, SQL Server, to verify some data. I want to return all the rows from both tables where data is either in one or the other. In essence, I want to show all the discrepancies. I need to check three pieces of data in doing so, FirstName, LastName and Product.

I'm fairly new to SQL and it seems like a lot of the solutions I'm finding are over complicating things. I don't have to worry about NULLs.

I started by trying something like this:

SELECT DISTINCT [First Name], [Last Name], [Product Name] FROM [Temp Test Data]
WHERE ([First Name] NOT IN (SELECT [First Name] 
FROM [Real Data]))

I'm having trouble taking this further though.

Thanks!

EDIT:

Based on the answer by @treaschf I have been trying to use a variation of the following query:

SELECT td.[First Name], td.[Last Name], td.[Product Name]
FROM [Temp Test Data] td FULL OUTER JOIN [Data] AS d 
ON td.[First Name] = d.[First Name] AND td.[Last Name] = d.[Last Name] 
WHERE (d.[First Name] = NULL) AND (d.[Last Name] = NULL)

But I keep getting 0 results back, when I know that there is at least 1 row in td that is not in d.

EDIT:

Ok, I think I figured it out. At least in my few minutes of testing it seems to work good enough.

SELECT [First Name], [Last Name]
FROM [Temp Test Data] AS td
WHERE (NOT EXISTS
        (SELECT [First Name], [Last Name]
         FROM [Data] AS d
         WHERE ([First Name] = td.[First Name]) OR ([Last Name] = td.[Last Name])))

This is basically going to tell me what is in my test data that is not in my real data. Which is completely fine for what I need to do.

Upvotes: 281

Answers (17)

Keith Townsend

Reputation: 21

I was inspired by Thato's comment on this answer where he linked to his dynamic solution. I found some great simplifications using techniques that weren't available 10 years ago when he created his version, so I want to share it here. His blog is having an error when I try to post it there, so this will have to do.

I found that using the JSON features of SQL Server to flatten all the columns and values very convenient. I was able to make it work with just one dynamic SQL statement that I use twice to get the flattened data from each table. All the rest of the work is native SQL within the sproc.

A few notable things:

wherever I mention something is case insensitive, it actually depends on your database collation settings
@table1 and @table2 are case insensitive and support schema prefix and quoted segments (i.e., square brackets); use quoted names just as you would in a query
columns that are not in both tables are excluded from the compare (they are expected to have the same spelling, case insensitive)
@keycol is case insensitive but is expected to be the same name in both tables; splitting it into two input variables would be easy if you need them to be different
@keycol does not need to be quoted when it has spaces because we are doing that where needed in the sproc, although we do attempt to remove square brackets and quotes if you provide them because most places need the bare name
@ignore is an optional list of column names to exclude from comparison
the key column in openjson() with() clause is case senstive, so I solved that by looking up the proper cased key names from sys.columns
flattening the values in the dynamic execute() is a convenient way capture the results in a standard table structure using insert-exec feature so we can avoid the challenges of temp tables and dealing with dynamic columns in the sproc
the last output is dynamically sorted with an int or bigint cast if the type in sys.columns for @keycol is one of those types, otherwise it is sorted by the column type of [key] in #deltas, which is varchar(max) from query on @values (dynamic sorting is a nice trick I learned many years ago)

The sproc:

create or alter proc test.table_compare (
    @table1 varchar(max),       --> first table to compare; use schema and brackets if needed
    @table2 varchar(max),       --> second table to compare; use schema and brackets if needed
    @keycol varchar(max),       --> join key (no brackets, expected same name in both tables)
    @ignore varchar(max) = null --> optional columns to ignore (comma-separated, no brackets)
) as

    /*
        Created by Keith Townsend, 5/16/2024
        Inspired by work done by Thato Mantai found at https://thitos.blogspot.com/2014/03/compare-data-from-two-tables.html
        Published as answer on StackOverflow: https://stackoverflow.com/a/78490999/18155330

        Compares all data between two tables for all columns that exist in both tables. 
        The key column is assumed to be named the same in both tables. It would be easy to change this to specify both key columns if that ever becomes necessary.
        The casing of the key column can be different in each table as long as it's spelled the same. We do a lookup to get the right casing for each table.

        Result sets:
            1. counts of null vs. good keys, in case your data has null keys that you didn't know about
            2. keys found in only one of the tables
            3. summary of affected fields with counts
            4. distinct differences with counts
            5. all values that do not match

        FUTURE: we could return all 3 sets of data as a single JSON result so the caller can receive and query it
    */

    --> we need this unquoted for proper comparison to sys.columns
    set @keycol = parsename(@keycol, 1)

    --> we need case-sensitive key column name for the openjson statement in @sql, so a quick lookup will get the right casing for each table
    declare @keycol1 varchar(max) = (select name from sys.columns where object_id = object_id(@table1) and name = @keycol)
    declare @keycol2 varchar(max) = (select name from sys.columns where object_id = object_id(@table2) and name = @keycol)

    --> get columns that are the same between the two tables; ignore any that aren't in both tables
    declare @columns varchar(max) = (
        select  name = string_agg(quotename(name), ', ') within group (order by name)
        from    (   select  a.name
                    from    sys.columns a
                            inner join sys.columns b
                                on a.object_id = object_id(@table1)
                                and b.object_id = object_id(@table2)
                                and a.name = b.name
                            left join (select trim(value) from string_split(@ignore, ',')) ex(name)
                                on ex.name = a.name
                    where   ex.name is null --> filter excluded columns
                ) x
    )

    --> json manipulation is an easy way to unpivot all the fields & values for all the rows
    declare @sql varchar(max) = '
        declare @json varchar(max) = (select ' + @columns + ' from {{tablename}} for json path)
        select  [{{keycol}}], [key], [value], [type], [table] = ''{{tablename}}''
        from    openjson(@json) with (
                    [{{keycol}}] varchar(max), 
                    Row nvarchar(max) ''$'' as JSON
                )
                outer apply openjson(Row)
    '

    --> table to capture results of execute()
    declare @values table (
        [key] varchar(max),
        [column] varchar(max),
        [value] varchar(max),
        [type] int,
        [table] varchar(max)
    )

    --> get data from first table
    declare @sql1 varchar(max) = replace(replace(@sql, '{{tablename}}', @table1), '{{keycol}}', @keycol1)
    insert @values execute(@sql1)

    --> get data from second table
    declare @sql2 varchar(max) = replace(replace(@sql, '{{tablename}}', @table2), '{{keycol}}', @keycol2)
    insert @values execute(@sql2)

    --> show counts of null vs. good keys for each table
    select  [table], 
            [null key] = count(case when [key] is null and [column] = @keycol then 1 end),
            [good key] = count(case when [key] is not null and [column] = @keycol then 1 end)
    from    @values 
    group by [table]

    --> show keys of unmatched records for each table
    select  x.[table],
            [unmatched] = count(*),
            [keys not found in other table] = string_agg(x.[key], ', ') within group (order by x.[key])
    from    (select distinct [table], [key] from @values where [table] = @table1) tk1
            full outer join (select distinct [table], [key] from @values where [table] = @table2) tk2
                on tk1.[key] = tk2.[key]
            outer apply (select
                coalesce(tk1.[table], tk2.[table]),
                coalesce(tk1.[key], tk2.[key]),
                case when tk1.[table] is null or tk2.[table] is null then 1 else 0 end
            ) x([table],[key], [unmatched])
    where   1 = x.unmatched
    group by x.[table]

    --> capture deltas
    drop table if exists #deltas
    select  [Key] = coalesce(t1.[key], t2.[key]),
            [Column] = coalesce(t1.[column], t2.[column]),
            Table1 = t1.Value,
            Table2 = t2.Value
    into    #deltas
    from    (select * from @values where [table] = @table1) t1
            full outer join (select * from @values where [table] = @table2) t2
                on t1.[key] = t2.[key]
                and t1.[column] = t2.[column]
    where   t1.Value != t2.Value

    --> show what fields are affected with counts
    select [Column], [Differences] = count(*) from #deltas group by [Column] order by 2 desc

    --> show distinct differences with counts
    select [Column], Table1, Table2, Records = count(*) from #deltas group by [Column], Table1, Table2 order by 4 desc

    --> show all differences
    declare @keytype varchar(20) = (select type_name(system_type_id) from sys.columns where object_id = object_id(@table1) and name = @keycol)
    select * from #deltas order by 
        --> each sort has to be a single type, so to conditionally cast you have to have separate sorts and return null for disabled sorts
        case when @keytype = 'int' then cast([key] as int) end,         --> if key type is int, order by int cast
        case when @keytype = 'bigint' then cast([key] as bigint) end,   --> if key type is bigint, order by bigint cast
        case when @keytype not in ('int', 'bigint') then [key] end,     --> if key type is not int, order without cast
        [column]
GO

Upvotes: 1

Md Reshad Raihan

Reputation: 63

SQL UNION and EXCEPT operations can be used to compare entire datasets between source and target tables for data completeness and correctness. For example:

(SELECT * FROM source_table)
UNION
(SELECT * FROM target_table)
EXCEPT
(SELECT * FROM source_table INTERSECT SELECT * FROM target_table);

You can also try UNION ALL if you want the actual number rows.

Upvotes: 0

Lespiegle

Reputation: 17

I can't add comments because my reputation isn't high enough. That seems silly.

Replying to @Juan Velez in the top voted answer which gives the solution

(   SELECT * FROM table1
    EXCEPT
    SELECT * FROM table2)  
UNION ALL
(   SELECT * FROM table2
    EXCEPT
    SELECT * FROM table1)

and Juan asked, "How do I see which tables had which rows"?

This part of the query selects all rows in table 1 that are not in table 2

(   SELECT * FROM @table1
    EXCEPT
    SELECT * FROM @table2)

This part of the query selects all rows in table 2 that are not in table 1

(   SELECT * FROM @table2
    EXCEPT
    SELECT * FROM @table1)

UNION ALL just puts them together so you can, just from one table, see if any of them is missing rows that the other has. If you want to see the individual differences try

INSERT INTO @table1NotTable2
    SELECT * FROM @table1
    EXCEPT
    SELECT * FROM @table2

INSERT INTO @table2NotTable1
    SELECT * FROM @table2
    EXCEPT
    SELECT * FROM @table1

And to achieve the same final result as his answer, use

SELECT * FROM @table1NotTable2
UNION ALL
SELECT * FROM @table2NotTable1

For people wanting this as a copy/paste template:

DECLARE @table1 TABLE (
    
)
DECLARE @table1NotTable2 TABLE (
    
)
DECLARE @table2 TABLE (
    
)

DECLARE @table2NotTable1 TABLE (
    
)

INSERT INTO @table1 (
    
)

INSERT INTO @table2 (

)

INSERT INTO @table1NotTable2
    SELECT * FROM @table1
    EXCEPT
    SELECT * FROM @table2

INSERT INTO @table2NotTable1
    SELECT * FROM @table2
    EXCEPT
    SELECT * FROM @table1

SELECT * FROM @table1NotTable2
SELECT * FROM @table2NotTable1
SELECT * FROM @table1NotTable2 UNION ALL SELECT * FROM @table2NotTable1

Upvotes: 0

Sri Vidhya Pavani

Reputation: 43

This ought to work just fine. A select query to find difference between row counts of two different tables

select (t1 - t2) as count_diff from (select (select count(*) from tbl1) t1, (select count(*) from tbl2) t2)

Upvotes: 0

Duanne

Reputation: 764

EXCEPT and NOT EXISTS are good ways to quickly find differences between sets of data, but I often want to know exactly which columns are different and how they are different.

What I find useful in these situations is using UNPIVOT to convert columns into key value pairs that one can compare row for row.

In order to be useful though, there needs to be a way to match the records you want to compare, in your example something like a social security number or a "person id":

CREATE TABLE [RealData] 
(
  [PersonId] INT,
  [FirstName] NVARCHAR(100), 
  [LastName] NVARCHAR(100),
  [ProductName] NVARCHAR(100)
)

CREATE TABLE [TempTestData] 
(
  [PersonId] INT,
  [FirstName] NVARCHAR(100), 
  [LastName] NVARCHAR(100),
  [ProductName] NVARCHAR(100)
)

INSERT INTO [RealData] ([PersonId], [FirstName], [LastName], [ProductName])
VALUES 
(1, 'Al', 'Bundy', 'Ladies Size 12'),
(2, 'Peggy', 'Bundy', 'TV Guide')

INSERT INTO [TempTestData] ([PersonId], [FirstName], [LastName], [ProductName])
VALUES 
(1, 'Al', 'Bundy', 'Ladies Size 13'),
(2, 'Peggy', 'Bundy', 'TV Guide')

And the UNPIVOT with a couple of CTEs thrown in:

;WITH RealDataCte AS (
  SELECT 
   'Real Data' AS [DataSource],
   unpivotedRealData.* 
  FROM
    (SELECT 
      CAST([PersonId] AS NVARCHAR(100)) AS [PersonId],
      [FirstName],
      [LastName], 
      [ProductName]
    FROM [RealData]) AS realData
  UNPIVOT
  (ColumnValue FOR ColumnName IN ([FirstName], [LastName],  [ProductName])) AS unpivotedRealData
),
TempTestDataCte AS (
  SELECT 
    'Temp Test Data' AS [DataSource],
     unpivotedDempTestData.* 
  FROM
    (SELECT 
      CAST([PersonId] AS NVARCHAR(100)) AS [PersonId],
      [FirstName],
      [LastName], 
      [ProductName]
    FROM [TempTestData]) AS tempTestData
  UNPIVOT
  (ColumnValue FOR ColumnName IN ([FirstName], [LastName],  [ProductName])) AS unpivotedDempTestData
)
SELECT
  RealDataCte.[DataSource],
  RealDataCte.[ColumnName],
  RealDataCte.[ColumnValue],
  TempTestDataCte.[DataSource],
  TempTestDataCte.[ColumnName],
  TempTestDataCte.[ColumnValue],
  CASE WHEN RealDataCte.[ColumnValue] <> TempTestDataCte.[ColumnValue] THEN 'YES' ELSE 'NO' END AS ColumnsDiffer
FROM RealDataCte
INNER JOIN
TempTestDataCte 
ON RealDataCte.[ColumnName] = TempTestDataCte.[ColumnName] 
AND RealDataCte.[PersonId] = TempTestDataCte.[PersonId] 
WHERE 
RealDataCte.[ColumnValue] <> TempTestDataCte.[ColumnValue]

The result - the 2 products are different:

This is great if you have many, many columns to compare over a large number of rows.

It can take a while to setup, though, and you do need to convert every column value to the same type, which can take a while (notice the CAST for PersonId).

If the dataset is really large you may also want to use temp tables instead of CTEs.

SQL Fiddle Example Here

Upvotes: 1

Tiago Moutinho

Reputation: 1372

There is a performance issue related with the left join as well as full join with large data.

In my opinion this is the best solution:

select [First Name], count(1) e from
(select * from [Temp Test Data]
 union all
 select * from [Temp Test Data 2]) a
group by [First Name] having e = 1

Upvotes: 1

GiovaniSalazar

Reputation: 2094

You can use except , for example something like this :

-- DB1..Tb1 have values than DB2..Tb1 not have
Select Col1,Col2,Col3 From DB1..Tb1
except
Select Col1,Col2,Col3 From DB2..Tb1
-- Now we change order
-- DB2..Tb1 have values than DB1..Tb1 not have
Select Col1,Col2,Col3 From DB2..Tb1
except
Select Col1,Col2,Col3 From DB1..Tb1

Upvotes: 4

crokusek

Reputation: 5654

Presenting the Cadillac of Diffs as an SP. See within for the basic template that was based on answer by @erikkallen. It supports

Duplicate row sensing (most other answers here do not)
Sort results by argument
Limit to specific columns
Ignore columns (e.g. ModifiedUtc)
Cross database tables names
Temp tables (use as workaround to diff views)

Usage:

exec Common.usp_DiffTableRows '#t1', '#t2';

exec Common.usp_DiffTableRows 
    @pTable0          = 'ydb.ysh.table1',
    @pTable1          = 'xdb.xsh.table2',
    @pOrderByCsvOpt   = null,  -- Order the results
    @pOnlyCsvOpt      = null,  -- Only compare these columns
    @pIgnoreCsvOpt    = null;  -- Ignore these columns (ignored if @pOnlyCsvOpt is specified)

Code:

alter proc [Common].[usp_DiffTableRows]    
    @pTable0          varchar(300),
    @pTable1          varchar(300),
    @pOrderByCsvOpt   nvarchar(1000) = null,  -- Order the Results
    @pOnlyCsvOpt      nvarchar(4000) = null,  -- Only compare these columns
    @pIgnoreCsvOpt    nvarchar(4000) = null,  -- Ignore these columns (ignored if @pOnlyCsvOpt is specified)
    @pDebug           bit = 0
as
/*---------------------------------------------------------------------------------------------------------------------
    Purpose:  Compare rows between two tables.

      Usage:  exec Common.usp_DiffTableRows '#a', '#b';

    Modified    By          Description
    ----------  ----------  -------------------------------------------------------------------------------------------
    2015.10.06  crokusek    Initial Version
    2019.03.13  crokusek    Added @pOrderByCsvOpt
    2019.06.26  crokusek    Support for @pIgnoreCsvOpt, @pOnlyCsvOpt.    
    2019.09.04  crokusek    Minor debugging improvement
    2020.03.12  crokusek    Detect duplicate rows in either source table
  ---------------------------------------------------------------------------------------------------------------------*/
begin try

    if (substring(@pTable0, 1, 1) = '#')
        set @pTable0 = 'tempdb..' + @pTable0; -- object_id test below needs full names for temp tables

    if (substring(@pTable1, 1, 1) = '#')
        set @pTable1 = 'tempdb..' + @pTable1; -- object_id test below needs full names for temp tables

    if (object_id(@pTable0) is null)
        raiserror('Table name is not recognized: ''%s''', 16, 1, @pTable0);

    if (object_id(@pTable1) is null)
        raiserror('Table name is not recognized: ''%s''', 16, 1, @pTable1);

    create table #ColumnGathering
    (
        Name nvarchar(300) not null,
        Sequence int not null,
        TableArg tinyint not null
    );

    declare
        @usp          varchar(100) = object_name(@@procid),    
        @sql          nvarchar(4000),
        @sqlTemplate  nvarchar(4000) = 
        '  
            use $database$;

            insert into #ColumnGathering
            select Name, column_id as Sequence, $TableArg$ as TableArg
              from sys.columns c
             where object_id = object_id(''$table$'', ''U'')
        ';          

    set @sql = replace(replace(replace(@sqlTemplate,
        '$TableArg$', 0),
        '$database$', (select DatabaseName from Common.ufn_SplitDbIdentifier(@pTable0))),
        '$table$', @pTable0);

    if (@pDebug = 1)
        print 'Sql #CG 0: ' + @sql;

    exec sp_executesql @sql;

    set @sql = replace(replace(replace(@sqlTemplate,
        '$TableArg$', 1),
        '$database$', (select DatabaseName from Common.ufn_SplitDbIdentifier(@pTable1))),
        '$table$', @pTable1);

    if (@pDebug = 1)
        print 'Sql #CG 1: ' + @sql;

    exec sp_executesql @sql;

    if (@pDebug = 1)
        select * from #ColumnGathering;

    select Name, 
           min(Sequence) as Sequence, 
           convert(bit, iif(min(TableArg) = 0, 1, 0)) as InTable0,
           convert(bit, iif(max(TableArg) = 1, 1, 0)) as InTable1
      into #Columns
      from #ColumnGathering
     group by Name
    having (     @pOnlyCsvOpt is not null 
             and Name in (select Value from Common.ufn_UsvToNVarcharKeyTable(@pOnlyCsvOpt, default)))
        or 
           (     @pOnlyCsvOpt is null
             and @pIgnoreCsvOpt is not null 
             and Name not in (select Value from Common.ufn_UsvToNVarcharKeyTable(@pIgnoreCsvOpt, default)))
        or 
           (     @pOnlyCsvOpt is null
             and @pIgnoreCsvOpt is null)

    if (exists (select 1 from #Columns where InTable0 = 0 or InTable1 = 0))
    begin
        select 1; -- without this the debugging info doesn't stream sometimes
        select * from #Columns order by Sequence;        
        waitfor delay '00:00:02';  -- give results chance to stream before raising exception
        raiserror('Columns are not equal between tables, consider using args @pIgnoreCsvOpt, @pOnlyCsvOpt.  See Result Sets for details.', 16, 1);    
    end

    if (@pDebug = 1)
        select * from #Columns order by Sequence;

    declare 
        @columns nvarchar(4000) = --iif(@pOnlyCsvOpt is null and @pIgnoreCsvOpt is null,
           -- '*',     
            (
              select substring((select ',' + ac.name
                from #Columns ac
               order by Sequence
                 for xml path('')),2,200000) as csv
            );

    if (@pDebug = 1)
    begin
        print 'Columns: ' + @columns;
        waitfor delay '00:00:02';  -- give results chance to stream before possibly raising exception
    end

    -- Based on https://stackoverflow.com/a/2077929/538763
    --     - Added sensing for duplicate rows
    --     - Added reporting of source table location
    --
    set @sqlTemplate = '
            with 
               a as (select ~, Row_Number() over (partition by ~ order by (select null)) -1 as Duplicates from $a$), 
               b as (select ~, Row_Number() over (partition by ~ order by (select null)) -1 as Duplicates from $b$)
            select 0 as SourceTable, ~
              from 
                 (
                   select * from a
                   except
                   select * from b
                 )  anb
              union all
             select 1 as SourceTable, ~
               from 
                 (
                   select * from b
                   except
                   select * from a
                 )  bna
             order by $orderBy$
        ';    

     set @sql = replace(replace(replace(replace(@sqlTemplate, 
            '$a$', @pTable0), 
            '$b$', @pTable1),
            '~', @columns),
            '$orderBy$', coalesce(@pOrderByCsvOpt, @columns + ', SourceTable')
        );

     if (@pDebug = 1)
        print 'Sql: ' + @sql;

     exec sp_executesql @sql;

end try
begin catch
    declare        
        @CatchingUsp  varchar(100) = object_name(@@procid);    

    if (xact_state() = -1)
        rollback;    

    -- Disabled for S.O. post

    --exec Common.usp_Log
        --@pMethod = @CatchingUsp;

    --exec Common.usp_RethrowError        
        --@pCatchingMethod = @CatchingUsp;

    throw;
end catch
go

create function Common.Trim
(
    @pOriginalString nvarchar(max), 
    @pCharsToTrim nvarchar(50) = null -- specify null or 'default' for whitespae 
)  
returns table
with schemabinding
as 
/*--------------------------------------------------------------------------------------------------
    Purpose:   Trim the specified characters from a string.

    Modified    By              Description
    ----------  --------------  --------------------------------------------------------------------
    2012.09.25  S.Rutszy/crok   Modified from https://dba.stackexchange.com/a/133044/9415    
  --------------------------------------------------------------------------------------------------*/ 
return
with cte AS
(
  select patindex(N'%[^' + EffCharsToTrim + N']%', @pOriginalString) AS [FirstChar],
         patindex(N'%[^' + EffCharsToTrim + N']%', reverse(@pOriginalString)) AS [LastChar],
         len(@pOriginalString + N'~') - 1 AS [ActualLength]
   from
   (
         select EffCharsToTrim = coalesce(@pCharsToTrim, nchar(0x09) + nchar(0x20) + nchar(0x0d) + nchar(0x0a))
   ) c
)
select substring(@pOriginalString, [FirstChar],
                 ((cte.[ActualLength] - [LastChar]) - [FirstChar] + 2)
       ) AS [TrimmedString]
       --
       --cte.[ActualLength],
       --[FirstChar],
       --((cte.[ActualLength] - [LastChar]) + 1) AS [LastChar]              
from cte;
go

create function [Common].[ufn_UsvToNVarcharKeyTable] (
    @pCsvList     nvarchar(MAX),
    @pSeparator   nvarchar(1) = ','       -- can pass keyword 'default' when calling using ()'s
    )    
    --
    -- SQL Server 2012 distinguishes nvarchar keys up to maximum of 450 in length (900 bytes)
    -- 
    returns @tbl table (Value nvarchar(450) not null primary key(Value)) as
/*-------------------------------------------------------------------------------------------------
    Purpose:  Converts a comma separated list of strings into a sql NVarchar table.  From

              http://www.programmingado.net/a-398/SQL-Server-parsing-CSV-into-table.aspx     

              This may be called from RunSelectQuery:

                  GRANT SELECT ON Common.ufn_UsvToNVarcharTable TO MachCloudDynamicSql;

    Modified    By              Description
    ----------  --------------  -------------------------------------------------------------------
    2011.07.13  internet        Initial version
    2011.11.22  crokusek        Support nvarchar strings and a custom separator.
    2017.12.06  crokusek        Trim leading and trailing whitespace from each element.
    2019.01.26  crokusek        Remove newlines
  -------------------------------------------------------------------------------------------------*/     
begin
    declare 
        @pos      int,
        @textpos  int,
        @chunklen smallint,
        @str      nvarchar(4000),
        @tmpstr   nvarchar(4000),
        @leftover nvarchar(4000),
        @csvList nvarchar(max) = iif(@pSeparator not in (char(13), char(10), char(13) + char(10)),
            replace(replace(@pCsvList, char(13), ''), char(10), ''),
            @pCsvList); -- remove newlines

    set @textpos = 1
    set @leftover = ''  
    while @textpos <= len(@csvList)
    begin
        set @chunklen = 4000 - len(@leftover)
        set @tmpstr = ltrim(@leftover + substring(@csvList, @textpos, @chunklen))
        set @textpos = @textpos + @chunklen

        set @pos = charindex(@pSeparator, @tmpstr)
        while @pos > 0
        begin
            set @str = substring(@tmpstr, 1, @pos - 1)
            set @str = (select TrimmedString from Common.Trim(@str, default));
            insert @tbl (value) values(@str);
            set @tmpstr = ltrim(substring(@tmpstr, @pos + 1, len(@tmpstr)))
            set @pos = charindex(@pSeparator, @tmpstr)
        end

        set @leftover = @tmpstr
    end

    -- Handle @leftover

    set @str = (select TrimmedString from Common.Trim(@leftover, default));

    if @str <> ''
       insert @tbl (value) values(@str);

    return
end
GO

create function Common.ufn_SplitDbIdentifier(@pIdentifier nvarchar(300))
returns @table table 
(    
    InstanceName          nvarchar(300) not null,
    DatabaseName          nvarchar(300) not null,
    SchemaName            nvarchar(300),
    BaseName              nvarchar(300) not null,
    FullTempDbBaseName    nvarchar(300),            -- non-null for tempdb (e.g. #Abc____...)
    InstanceWasSpecified  bit not null,
    DatabaseWasSpecified  bit not null,
    SchemaWasSpecified    bit not null,
    IsCurrentInstance     bit not null,
    IsCurrentDatabase     bit not null,
    IsTempDb              bit not null,
    OrgIdentifier         nvarchar(300) not null
) as
/*-----------------------------------------------------------------------------------------------------------
    Purpose:  Split a Sql Server Identifier into its parts, providing appropriate default values and
              handling temp table (tempdb) references.

    Example:  select * from Common.ufn_SplitDbIdentifier('t')
              union all
              select * from Common.ufn_SplitDbIdentifier('s.t')
              union all
              select * from Common.ufn_SplitDbIdentifier('d.s.t')
              union all
              select * from Common.ufn_SplitDbIdentifier('i.d.s.t')
              union all
              select * from Common.ufn_SplitDbIdentifier('#d')
              union all
              select * from Common.ufn_SplitDbIdentifier('tempdb..#d'); 

              -- Empty
              select * from Common.ufn_SplitDbIdentifier('illegal name'); 

    Modified    By              Description
    ----------  --------------  -----------------------------------------------------------------------------
    2013.09.27  crokusek        Initial version.  
  -----------------------------------------------------------------------------------------------------------*/
begin
    declare 
        @name nvarchar(300) = ltrim(rtrim(@pIdentifier));

    -- Return an empty table as a "throw"
    --
    --Removed for SO post
    --if (Common.ufn_IsSpacelessLiteralIdentifier(@name) = 0)
      --  return;

    -- Find dots starting from the right by reversing first.

    declare 
        @revName nvarchar(300) = reverse(@name);

    declare
        @firstDot int = charindex('.', @revName);

    declare
        @secondDot  int = iif(@firstDot = 0,  0, charindex('.', @revName, @firstDot + 1));

    declare
        @thirdDot   int = iif(@secondDot = 0, 0, charindex('.', @revName, @secondDot + 1));

    declare
        @fourthDot  int = iif(@thirdDot = 0, 0, charindex('.', @revName, @thirdDot + 1));

    --select @firstDot, @secondDot, @thirdDot, @fourthDot, len(@name);

    -- Undo the reverse() (first dot is first from the right).
    --
    set @firstDot = iif(@firstDot = 0, 0, len(@name) - @firstDot + 1);
    set @secondDot = iif(@secondDot = 0, 0, len(@name) - @secondDot + 1);
    set @thirdDot = iif(@thirdDot = 0, 0, len(@name) - @thirdDot + 1);
    set @fourthDot = iif(@fourthDot = 0, 0, len(@name) - @fourthDot + 1);

    --select @firstDot, @secondDot, @thirdDot, @fourthDot, len(@name);

    declare
        @baseName   nvarchar(300)  = substring(@name, @firstDot + 1, len(@name) - @firstdot);

    declare
        @schemaName nvarchar(300) = iif(@firstDot - @secondDot - 1 <= 0, 
                                        null,
                                        substring(@name, @secondDot + 1, @firstDot - @secondDot - 1));
    declare
        @dbName     nvarchar(300) = iif(@secondDot - @thirdDot - 1 <= 0, 
                                        null,
                                        substring(@name, @thirdDot + 1, @secondDot - @thirdDot - 1));
    declare
        @instName   nvarchar(300) = iif(@thirdDot - @fourthDot - 1 <= 0, 
                                        null, 
                                        substring(@name, @fourthDot + 1, @thirdDot - @fourthDot - 1));

    with input as (
        select
           coalesce(@instName, '[' + @@servername + ']') as InstanceName,
           coalesce(@dbName,     iif(left(@baseName, 1) = '#', 'tempdb', db_name())) as DatabaseName,
           coalesce(@schemaName, iif(left(@baseName, 1) = '#', 'dbo', schema_name())) as SchemaName,
           @baseName as BaseName,
           iif(left(@baseName, 1) = '#',
               (
                  select [name] from tempdb.sys.objects
                  where object_id = object_id('tempdb..' + @baseName)
               ), 
               null) as FullTempDbBaseName,                
           iif(@instName is null, 0, 1) InstanceWasSpecified,       
           iif(@dbName is null, 0, 1) DatabaseWasSpecified,
           iif(@schemaName is null, 0, 1) SchemaWasSpecified    
     )
     insert into @table           
     select i.InstanceName, i.DatabaseName, i.SchemaName, i.BaseName, i.FullTempDbBaseName,
            i.InstanceWasSpecified, i.DatabaseWasSpecified, i.SchemaWasSpecified,
            iif(i.InstanceName = '[' + @@servername + ']', 1, 0) as IsCurrentInstance,
            iif(i.DatabaseName = db_name(), 1, 0) as IsCurrentDatabase,
            iif(left(@baseName, 1) = '#', 1, 0) as IsTempDb,
            @name as OrgIdentifier
       from input i;

    return;
end
GO

Upvotes: 5

studgeek

Reputation: 14920

Simple variation on @erikkallen answer that shows which table the row is present in:

(   SELECT 'table1' as source, * FROM table1
    EXCEPT
    SELECT * FROM table2)  
UNION ALL
(   SELECT 'table2' as source, * FROM table2
    EXCEPT
    SELECT * FROM table1)

If you get an error

All queries combined using a UNION, INTERSECT or EXCEPT operator must have an equal number of expressions in their target lists.

then it may help to add

(   SELECT 'table1' as source, * FROM table1
    EXCEPT
    SELECT 'table1' as source, * FROM table2)  
UNION ALL
(   SELECT 'table2' as source, * FROM table2
    EXCEPT
    SELECT 'table2' as source, * FROM table1)

Upvotes: 18

bilelovitch

Reputation: 2175

To get all the differences between two tables, you can use like me this SQL request :

SELECT 'TABLE1-ONLY' AS SRC, T1.*
FROM (
      SELECT * FROM Table1
      EXCEPT
      SELECT * FROM Table2
      ) AS T1
UNION ALL
SELECT 'TABLE2-ONLY' AS SRC, T2.*
FROM (
      SELECT * FROM Table2
      EXCEPT
      SELECT * FROM Table1
      ) AS T2
;

Upvotes: 24

Adrian-Bogdan Ionescu

Reputation: 29

This will do the trick, similar with Tiago's solution, return "source" table as well.

select [First name], [Last name], max(_tabloc) as _tabloc
from (
  select [First Name], [Last name], 't1' as _tabloc from table1
  union all
  select [First name], [Last name], 't2' as _tabloc from table2
) v
group by [Fist Name], [Last name]
having count(1)=1

result will contain differences between tables, in column _tabloc you will have table reference.

Upvotes: 2

thomas398

Reputation: 59

For a simple smoke test where you you're trying to ensure two tables match w/out worrying about column names:

--ensure tables have matching records
Select count (*) from tbl_A
Select count (*) from tbl_B

--create temp table of all records in both tables
Select * into #demo from tbl_A 
Union All
Select * from tbl_B

--Distinct #demo records = Total #demo records/2 = Total tbl_A records = total tbl_B records
Select distinct * from #demo

You can easily write a store procedure to compare a batch of tables.

Upvotes: 1

roman

Reputation: 117561

If you want to get which column values are different, you could use Entity-Attribute-Value model:

declare @Data1 xml, @Data2 xml

select @Data1 = 
(
    select * 
    from (select * from Test1 except select * from Test2) as a
    for xml raw('Data')
)

select @Data2 = 
(
    select * 
    from (select * from Test2 except select * from Test1) as a
    for xml raw('Data')
)

;with CTE1 as (
    select
        T.C.value('../@ID', 'bigint') as ID,
        T.C.value('local-name(.)', 'nvarchar(128)') as Name,
        T.C.value('.', 'nvarchar(max)') as Value
    from @Data1.nodes('Data/@*') as T(C)    
), CTE2 as (
    select
        T.C.value('../@ID', 'bigint') as ID,
        T.C.value('local-name(.)', 'nvarchar(128)') as Name,
        T.C.value('.', 'nvarchar(max)') as Value
    from @Data2.nodes('Data/@*') as T(C)     
)
select
    isnull(C1.ID, C2.ID) as ID, isnull(C1.Name, C2.Name) as Name, C1.Value as Value1, C2.Value as Value2
from CTE1 as C1
    full outer join CTE2 as C2 on C2.ID = C1.ID and C2.Name = C1.Name
where
not
(
    C1.Value is null and C2.Value is null or
    C1.Value is not null and C2.Value is not null and C1.Value = C2.Value
)

SQL FIDDLE EXAMPLE

Upvotes: 8

Maisie John

Reputation: 1026

I know that this may not be a popular answer but I do agree with @Randy Minder on using third party tool when more complex comparison is needed.

This specific case here is easy and for this case such tools are not needed but this can get complex easily if you introduce more columns, databases on two servers, more complex comparison criteria and such.

There are a lot of these tools such as ApexSQL Data Diff or Quest Toad and you can always use them in trial mode to get the job done.

Upvotes: 39

erikkallen

Reputation: 34421

(   SELECT * FROM table1
    EXCEPT
    SELECT * FROM table2)  
UNION ALL
(   SELECT * FROM table2
    EXCEPT
    SELECT * FROM table1)

Upvotes: 347

Kango_V

Reputation: 1720

Try this :

SELECT 
    [First Name], [Last Name]
FROM 
    [Temp Test Data] AS td EXCEPTION JOIN [Data] AS d ON 
         (d.[First Name] = td.[First Name] OR d.[Last Name] = td.[Last Name])

Much simpler to read.

Upvotes: 2

treaschf

Reputation: 5944

IF you have tables A and B, both with colum C, here are the records, which are present in table A but not in B:

SELECT A.*
FROM A
    LEFT JOIN B ON (A.C = B.C)
WHERE B.C IS NULL

To get all the differences with a single query, a full join must be used, like this:

SELECT A.*, B.*
FROM A
    FULL JOIN B ON (A.C = B.C)
WHERE A.C IS NULL OR B.C IS NULL

What you need to know in this case is, that when a record can be found in A, but not in B, than the columns which come from B will be NULL, and similarly for those, which are present in B and not in A, the columns from A will be null.

Upvotes: 291

sql query to return differences between two tables

Answers (17)

Usage:

Code:

Related Questions