Oxed Frederik
Oxed Frederik

Reputation: 1401

SQL Server compare results of two queries that should be identical

I am modifying a sql server 2005 stored procedure slightly for performance, and I would like to quickly make sure the old stored proc and the new one return the exact same results (the columns are the same, I want to make sure the rows are the same).

Is there a simple way to do this in sql server 2005?

Upvotes: 66

Views: 112005

Answers (9)

ZenZei
ZenZei

Reputation: 1

Some answers above are lacking duplicate-row checks, and others don’t cover everything either.

Here's a better, more complete solution. It checks for duplicates, and also points if the differences are due to missing rows, or duplicate rows, and how many are missing.

drop table if exists #t1;
drop table if exists #t2;

create table #t1 (a int);
create table #t2 (a int);

-- sample values: some missing, some duplicates
insert into #t1 values (1),(1),(2)            ,(3)       ;
insert into #t2 values (1)    ,(2),(2),(2),(2),   (4),(4);


--- replace the 6 instances of "a" below with a list of all the columns in the tables
with
  t1_ordered as (select *, ROW_NUMBER() over (partition by a order by a) as __count from #t1)
, t2_ordered as (select *, ROW_NUMBER() over (partition by a order by a) as __count from #t2)
, missing_from_t2 as ( select * from t1_ordered except select * from t2_ordered)
, missing_from_t1 as ( select * from t2_ordered except select * from t1_ordered)
, pre_final_count as ( select *, 'T2' as __not_in_table from missing_from_t2
                                     union all
                                     select *, 'T1' as __not_in_table from missing_from_t1
                                  )
select 
       __not_in_table, 
       case when MIN(__count) = 1 then 'missing full row' else 'missing duplicates' end as __missing_type,
       MAX(__count) - MIN(__count) + 1  as __number_of_missing_rows,
       a
from pre_final_count
group by 
       __not_in_table ,
       a
;

Results

Now, this might not be totally ideal with very large tables, due to the multiple on-the-fly sorts and joins that the query plan has. For large tables, pre-creating sorted+numbered temporary tables, indexing, and then full-outer-joining them could be more efficient. Like the following:

--- replace the instances of "a" below with a list of all the columns in the tables
drop table if exists #t1_ordered;
drop table if exists #t2_ordered;
select *, ROW_NUMBER() over (partition by a order by a) as __count1 into #t1_ordered from #t1;
select *, ROW_NUMBER() over (partition by a order by a) as __count2 into #t2_ordered from #t2;
-- add some indexes
create clustered columnstore index __t1_ordered_CCI on #t1_ordered;
create clustered columnstore index __t2_ordered_CCI on #t2_ordered;
select 
       * 
from #t1_ordered 
       full outer join #t2_ordered
              on #t1_ordered.a = #t2_ordered.a --add lines to join all the t1 and t2 columns here if more than 1 column
                 and #t1_ordered.__count1 = #t2_ordered.__count2
where __count1 is null or __count2 is null ;

Which produces a quite different execution plan, in this case the final step is a beautiful hash match over CCIs which is quite efficient on large tables.

Result (not as pretty – it could be then aggregated / made prettier like the simpler example above, by coalesce-ing + grouping)

Results

Of course, all this could be made dynamic (to avoid having to replace the “a” below with all the columns in the table), but you get the idea.

Just my two cents!

Upvotes: 0

AD415
AD415

Reputation: 1

I found this query works the best in T-SQL to compare two queries and to know which table the results are coming from

WITH CTE AS
 (SELECT insert query 1

EXCEPT

SELECT insert query 2)

SELECT insert query 2, 'a'

EXCEPT

SELECT insert query 1, 'a'

UNION ALL

SELECT *, 'b'  FROM cte

Upvotes: 0

tpvasconcelos
tpvasconcelos

Reputation: 717

To complete @jabs answer, you can use the following template to get the difference between two queries. It also adds an extra column (diff_description) that explains from which query each row is missing.

with q1 as (
    <INSERT_Q1_HERE>
)
, q2 as (
    <INSERT_Q2_HERE>
)
, missing_from_q2 as (
    select *
    from (
        select * from q1 
        except 
        select * from q2
    )
    cross join (select 'missing from q2' as diff_description)
)
, missing_from_q1 as (
    select *
    from (
        select * from q2 
        except
        select * from q1
    )
    cross join (select 'missing from q1' as diff_description)
)
select * from missing_from_q2
union all
select * from missing_from_q1

Upvotes: 14

Here some more verbose aproach that helped me while studying the matter on MSSQL. @tpvasconcelos´s answer is just the most correct so far.

DECLARE @AAA TABLE(id bigint NOT NULL) 
    INSERT INTO @AAA
    VALUES (1),(2),(3),(4),(5),(6),(7)

DECLARE @bbb TABLE(id bigint NOT NULL)
    INSERT INTO @bbb
    VALUES (1),(2),(3),(4),(5),(6),(7)

Declare @diff int = (SELECT COUNT(*) FROM (SELECT * FROM @AAA EXCEPT SELECT * FROM @bbb) AS TB)


Declare @aux1 int;
set @aux1 = (select count(*) from @BBB);

Declare @aux2 int;
set @aux2 = (SELECT COUNT(*)  FROM @AAA)

Declare @aux3 int;
set @aux3 = (SELECT COUNT(*) FROM (select * from @AAA union SELECT * FROM @bbb) as tb);  -- for union to work it needs a alias


IF @diff <> 0
    begin
        PRINT 'Flow @flows_name has failed.'
    end
else
    begin
        IF @aux1 = @aux3
            begin
                PRINT 'Flow @flows_name might have SUCCEEDED!'
            end
        else
            begin
                PRINT 'Flow @flows_name has failed.'
            end
    end

Upvotes: 0

Antonio Rodr&#237;guez
Antonio Rodr&#237;guez

Reputation: 1126

EXCEPT is the key to compare two querys (as @jabs said).

SELECT count(*), * FROM "query 1 here"
EXCEPT
SELECT count(*), * FROM "query 2 here"

Adding count(*) for each query to make sure both have the same results. Just in case there are some repeated rows which are deleted by except.

Upvotes: 7

CodeCowboyOrg
CodeCowboyOrg

Reputation: 3043

The stored proc below will compare the output resultset of 2 stored procedures, or 2 statements. The key here is the SP does not need to know the structure or schema of the result set, thus you can arbitrarily test any SP. It will return 0 rows if the output is the same. This solution uses openrowset command in SQL Server. Here is some sample usage of the Stored proc

DECLARE @SQL_SP1 VARCHAR(MAX)
DECLARE @SQL_SP2 VARCHAR(MAX)

-- Compare results of 2 Stored Procs
SET @SQL_SP1 = 'EXEC SomeDB.dbo.[usp_GetWithTheProgram_OLD] 100, ''SomeParamX'''
SET @SQL_SP1 = 'EXEC SomeDB.dbo.[usp_GetWithTheProgram_NEW] 50, ''SomeParamX'''
EXEC utlCompareStatementResults @SQL_SP1, @SQL_SP2

-- Compare just 2 SQL Statements
SET @SQL_SP1 = 'SELECT * FROM SomeDB.dbo.Table1 WHERE CreatedOn > ''2016-05-08'''
SET @SQL_SP1 = 'SELECT * FROM SomeDB.dbo.Table1 WHERE CreatedOn > ''2016-06-11'''
EXEC utlCompareStatementResults @SQL_SP1, @SQL_SP2

The SP requires the following prerequisites, which may not be ideal for a production environment, but very useful for local QA, DEV and Test environments. It uses openrowset in the code.

EXEC sp_configure 'show advanced options', 1
EXEC sp_configure 'ad hoc distributed queries', 1
EXEC sp_serveroption @@SERVERNAME, 'DATA ACCESS', TRUE

Here is the code for the stored proc.

==================================================================================
    --== SUMMARY utlCompareStatementResults
    --==    - requires sp_configure 'show advanced options', 1
    --==    - requires sp_configure 'ad hoc distributed queries', 1
    --==    - maybe requires EXEC sp_serveroption @@SERVERNAME, 'DATA ACCESS', TRUE
    --==    - requires the RecordSet Output to have Unique ColumnNames (no duplicate columns)
    --==    - requires references in straight SQL to be fully qualified [dbname].[schema].[objects] but not within an SP
    --==    - requires references SP call to be fully qualifed [dbname].[schema].[spname] but not objects with the SP
    --== OUTPUT
    --==    Differences are returned 
    --==    If there is no recordset returned, then theres no differences
    --==    However if you are comparing 2 empty recordsets, it doesn't mean anything
    --== USAGE
    --==   DECLARE @SQL_SP1 VARCHAR(MAX)
    --==   DECLARE @SQL_SP2 VARCHAR(MAX)
    --==   -- Compare just 2 SQL Statements
    --==   SET @SQL_SP1 = 'SELECT * FROM SomeDB.dbo.Table1 WHERE CreatedOn > ''2016-05-08'''
    --==   SET @SQL_SP1 = 'SELECT * FROM SomeDB.dbo.Table1 WHERE CreatedOn > ''2016-06-11'''
    --==   EXEC utlCompareStatementResults @SQL_SP1, @SQL_SP2
    --==
    --==   -- Compare results of 2 Stored Procs
    --==   SET @SQL_SP1 = 'EXEC SomeDB.dbo.[usp_GetWithTheProgram_OLD] 100, ''SomeParamX'''
    --==   SET @SQL_SP1 = 'EXEC SomeDB.dbo.[usp_GetWithTheProgram_NEW] 50, ''SomeParamX'''
    --==   EXEC utlCompareStatementResults @SQL_SP1, @SQL_SP2
    --==================================================================================
    CREATE PROCEDURE utlCompareStatementResults
       @SQL_SP1 VARCHAR(MAX),
       @SQL_SP2 VARCHAR(MAX)
    AS
    BEGIN
        DECLARE @TABLE1 VARCHAR(200)
        DECLARE @TABLE2 VARCHAR(200)
        DECLARE @SQL_OPENROWSET VARCHAR(MAX) 
        DECLARE @CONNECTION VARCHAR(100)

        SET @CONNECTION = 'server='+@@SERVERNAME+';Trusted_Connection=yes'

        SET @SQL_SP1 = REPLACE(@SQL_SP1, '''','''''')
        SET @SQL_SP2 = REPLACE(@SQL_SP2, '''','''''')

        SET @TABLE1 = '#' + SUBSTRING(CONVERT(VARCHAR(250),NEWID()), 1, 8)
        SET @TABLE2 = '#' + SUBSTRING(CONVERT(VARCHAR(250),NEWID()), 1, 8)

        SET @SQL_OPENROWSET =
        'SELECT * ' + ' ' +
        'INTO ' + @TABLE1 + ' ' +
        'FROM OPENROWSET(''SQLNCLI'', ' + '''' + @CONNECTION + '''' +
                        ',''' + @SQL_SP1 +'''); ' +
        'SELECT * ' + ' ' +
        'INTO ' + @TABLE2 + ' ' +
        'FROM OPENROWSET(''SQLNCLI'', ' + '''' + @CONNECTION + '''' +
                        ',''' + @SQL_SP2 +'''); ' +
        '(SELECT * FROM ' + @TABLE1 + ' EXCEPT SELECT * FROM ' + @TABLE2 + ') '  +
        ' UNION ALL ' +
        '(SELECT * FROM ' + @TABLE2 + ' EXCEPT SELECT * FROM ' + @TABLE1 + '); ' +
        'DROP TABLE ' + @TABLE1 + '; ' +
        'DROP TABLE ' + @TABLE2 + '; '
        PRINT @SQL_OPENROWSET
        EXEC (@SQL_OPENROWSET)
        PRINT 'DifferenceCount: ' + CONVERT(VARCHAR(100), @@ROWCOUNT)
    END

Upvotes: 5

David Brabant
David Brabant

Reputation: 43499

create table #OldProcResults (
    <Blah>
)

create table #NewProcResults (
    <Blih>
)

insert into #OldProcResults
    exec MyOldProc

insert into #NewProcResults
    exec MyNewProc

then use Jabs' answer to compare the two tables.

Upvotes: 2

jabs
jabs

Reputation: 1848

you can use the except construct to match between the two queries.

select * from (select * from query1) as query1
except
select * from (select * from query2) as query2

EDIT:

Then reverse the query to find differences with query2 as the driver:

select * from (select * from query2) as query2
except
select * from (select * from query1) as query1

Upvotes: 86

Randy
Randy

Reputation: 16677

create two temp tables, one for each procedure. run the procedure to insert rows into the appropriate table.

then select * from one MINUS select * from the other and visa-versa

Upvotes: 0

Related Questions