Christian4145
Christian4145

Reputation: 543

How to identify cells where two rows differ?

Sometimes I wonder if there is a good practise to identify the cell(s) in which two rows of a database table differ when they are supposed to be eqal. This is a very simple scenario for the problem:

Two tables with one row each:

select 1 c1, 2 c2, 3 c3 into #t1
select 1 c1, 2 c2, 4 c3 into #t2

The two rows are identified as different (It is not my question how to identify these rows. In this szenario the doublicate is identified by the union-operator. I could use a binary-checksum or whatever. This is not the question.)

select * from #t1
union
select * from #t2

In this case it is obvious (very simple and fast) to see that c3 (value 3 and 4)makes the difference between the two rows which are supposed to be equal.

1   2   3
1   2   4

How to identify the mismatching cells, when there are hunderts of columns? (Don't tell me not to create such tables...)

Is there a generic way? I'd prefer some kind of 'highlightning' the cells that differ. (Maybe use R-in-database and export to excel?...)


Another way to expain the problem:

I have a table where id should be an unique identifier - but it is not. I have to analyze the data in the table.

 if object_id('tempdb..#t1') is not null drop table #t1

 create table #t1 (
 id int,
 c01 int,
 c02 int,
 c03 int,
 c04 int,
 c05 int,
 c06 int,
 c07 int,
 c08 int,
 c09 int,
 c10 int,
 c11 int,
 c12 int,
 c13 int,
 c14 int,
 c15 int,
 c16 int,
 c17 int,
 c18 int,
 c19 int,
 c20 int,
 c21 int,
 c22 int,
 c23 int,
 c24 int,
 c25 int,
 c26 int,
 c27 int,
 c28 int,
 c29 int,
 c30 int,
 c31 int,
 c32 int,
 c33 int,
 c34 int,
 c35 int,
 c36 int,
 c37 int,
 c38 int,
 c39 int,
 c40 int,
 c41 int,
 c42 int,
 c43 int,
 c44 int,
 c45 int,
 c46 int,
 c47 int,
 c48 int,
 c49 int
 )

 insert #t1 (id, c11) values (1, 1)
 insert #t1 (id, c12) values (1, 1)

 insert #t1 (id, c11) values (2, 1)
 insert #t1 (id, c11) values (3, 1)

 insert #t1 (id, c21) values (4, 1)
 insert #t1 (id, c32) values (4, 1)

That's what I do: I have a look at the data "with problems", that is data with differnt values in some cells. This task is trivial when there are very few columns. When there are hundrets of columns (and not nearly everything is NULL like in this example) this a a demanding task.

 select a.* from #t1 a
 inner join (select id from #t1 group by id having count(*) > 1) b
 on a.id = b.id
 order by id

I would like to see the name of the cells that differ. In this example for example a resultset like this one would do.

 id columnname
 1  c11
 1  c12
 4  c21
 4  c32

Yellowfilled cells in SSMS would also be fine...

Upvotes: 2

Views: 252

Answers (2)

markusk
markusk

Reputation: 6677

Use UNPIVOT to convert the table to one row per column, and compare matching columns:

 SELECT tu1.cell_name, tu1.cell_value, tu2.cell_value
 FROM (SELECT cell_name, cell_value
       FROM #t1
       UNPIVOT (cell_value FOR cell_name IN c1, c2, c3)) tu1
 JOIN (SELECT cell_name, cell_value
       FROM #t2
       UNPIVOT (cell_value FOR cell_name IN c1, c2, c3)) tu2
 ON (tu1.cell_name = tu2.cell_name AND tu1.cell_value <> tu2.cell_value);

Untested, SQL Fiddle is acting up.

If you don't know the columns in advance, see https://stackoverflow.com/a/13377114/108326 for how to call UNPIVOT dynamically. This approach retrieves the column list from database metadata, then generates an UNPIVOT query on the fly.

Upvotes: 0

John Cappelletti
John Cappelletti

Reputation: 81970

The following will "dynamically" unpivot your data by using a little XML.

UnPivot would be more performant, but at least here you don't have to specify all the fields

Example

;with cte as (
Select A.ID
      ,C.*
      ,Fields = sum(1) over (Partition by A.ID)
      ,Vals   = sum(1) over (Partition by A.ID,Field,Value)
 From #t1 A
 Cross Apply ( values (cast((Select A.* for XML RAW) as xml))) B(XMLData)
 Cross Apply (
                Select Field = a.value('local-name(.)','varchar(100)')
                      ,Value = a.value('.','varchar(max)') 
                 From  B.XMLData.nodes('/row')  as C1(n)
                 Cross Apply C1.n.nodes('./@*') as C2(a)
                 Where a.value('local-name(.)','varchar(100)') not in ('id','OtherExcludeColumns')
             ) C
) 
Select ID
      ,Field
 From  cte
 Where Fields>1 and Vals=1 

Returns

ID  Field
1   c11
1   c12
4   c21
4   c32

EDIT

A shortcut just to find rows which have different values

Select A.ID
 From  #T1 A
 Cross Apply ( values ((Select A.* for XML RAW) )) B(XMLData) 
 Group by A.ID
 Having min(XMLData)<>max(XMLData)

Returns

ID
1
4

Upvotes: 2

Related Questions