How to identify cells where two rows differ?

Question

Sometimes I wonder if there is a good practise to identify the cell(s) in which two rows of a database table differ when they are supposed to be eqal. This is a very simple scenario for the problem:

Two tables with one row each:

select 1 c1, 2 c2, 3 c3 into #t1
select 1 c1, 2 c2, 4 c3 into #t2

The two rows are identified as different (It is not my question how to identify these rows. In this szenario the doublicate is identified by the union-operator. I could use a binary-checksum or whatever. This is not the question.)

select * from #t1
union
select * from #t2

In this case it is obvious (very simple and fast) to see that c3 (value 3 and 4)makes the difference between the two rows which are supposed to be equal.

1   2   3
1   2   4

How to identify the mismatching cells, when there are hunderts of columns? (Don't tell me not to create such tables...)

Is there a generic way? I'd prefer some kind of 'highlightning' the cells that differ. (Maybe use R-in-database and export to excel?...)

Another way to expain the problem:

I have a table where id should be an unique identifier - but it is not. I have to analyze the data in the table.

 if object_id('tempdb..#t1') is not null drop table #t1

 create table #t1 (
 id int,
 c01 int,
 c02 int,
 c03 int,
 c04 int,
 c05 int,
 c06 int,
 c07 int,
 c08 int,
 c09 int,
 c10 int,
 c11 int,
 c12 int,
 c13 int,
 c14 int,
 c15 int,
 c16 int,
 c17 int,
 c18 int,
 c19 int,
 c20 int,
 c21 int,
 c22 int,
 c23 int,
 c24 int,
 c25 int,
 c26 int,
 c27 int,
 c28 int,
 c29 int,
 c30 int,
 c31 int,
 c32 int,
 c33 int,
 c34 int,
 c35 int,
 c36 int,
 c37 int,
 c38 int,
 c39 int,
 c40 int,
 c41 int,
 c42 int,
 c43 int,
 c44 int,
 c45 int,
 c46 int,
 c47 int,
 c48 int,
 c49 int
 )

 insert #t1 (id, c11) values (1, 1)
 insert #t1 (id, c12) values (1, 1)

 insert #t1 (id, c11) values (2, 1)
 insert #t1 (id, c11) values (3, 1)

 insert #t1 (id, c21) values (4, 1)
 insert #t1 (id, c32) values (4, 1)

That's what I do: I have a look at the data "with problems", that is data with differnt values in some cells. This task is trivial when there are very few columns. When there are hundrets of columns (and not nearly everything is NULL like in this example) this a a demanding task.

 select a.* from #t1 a
 inner join (select id from #t1 group by id having count(*) > 1) b
 on a.id = b.id
 order by id

I would like to see the name of the cells that differ. In this example for example a resultset like this one would do.

 id columnname
 1  c11
 1  c12
 4  c21
 4  c32

Yellowfilled cells in SSMS would also be fine...

John Cappelletti · Accepted Answer

The following will "dynamically" unpivot your data by using a little XML.

UnPivot would be more performant, but at least here you don't have to specify all the fields

Example

;with cte as (
Select A.ID
      ,C.*
      ,Fields = sum(1) over (Partition by A.ID)
      ,Vals   = sum(1) over (Partition by A.ID,Field,Value)
 From #t1 A
 Cross Apply ( values (cast((Select A.* for XML RAW) as xml))) B(XMLData)
 Cross Apply (
                Select Field = a.value('local-name(.)','varchar(100)')
                      ,Value = a.value('.','varchar(max)') 
                 From  B.XMLData.nodes('/row')  as C1(n)
                 Cross Apply C1.n.nodes('./@*') as C2(a)
                 Where a.value('local-name(.)','varchar(100)') not in ('id','OtherExcludeColumns')
             ) C
) 
Select ID
      ,Field
 From  cte
 Where Fields>1 and Vals=1

Returns

ID  Field
1   c11
1   c12
4   c21
4   c32

EDIT

A shortcut just to find rows which have different values

Select A.ID
 From  #T1 A
 Cross Apply ( values ((Select A.* for XML RAW) )) B(XMLData) 
 Group by A.ID
 Having min(XMLData)<>max(XMLData)

Returns

ID
1
4

How to identify cells where two rows differ?

Answers (2)

Related Questions