Reputation: 543
Sometimes I wonder if there is a good practise to identify the cell(s) in which two rows of a database table differ when they are supposed to be eqal. This is a very simple scenario for the problem:
Two tables with one row each:
select 1 c1, 2 c2, 3 c3 into #t1
select 1 c1, 2 c2, 4 c3 into #t2
The two rows are identified as different (It is not my question how to identify these rows. In this szenario the doublicate is identified by the union-operator. I could use a binary-checksum or whatever. This is not the question.)
select * from #t1
union
select * from #t2
In this case it is obvious (very simple and fast) to see that c3 (value 3 and 4)makes the difference between the two rows which are supposed to be equal.
1 2 3
1 2 4
How to identify the mismatching cells, when there are hunderts of columns? (Don't tell me not to create such tables...)
Is there a generic way? I'd prefer some kind of 'highlightning' the cells that differ. (Maybe use R-in-database and export to excel?...)
Another way to expain the problem:
I have a table where id should be an unique identifier - but it is not. I have to analyze the data in the table.
if object_id('tempdb..#t1') is not null drop table #t1
create table #t1 (
id int,
c01 int,
c02 int,
c03 int,
c04 int,
c05 int,
c06 int,
c07 int,
c08 int,
c09 int,
c10 int,
c11 int,
c12 int,
c13 int,
c14 int,
c15 int,
c16 int,
c17 int,
c18 int,
c19 int,
c20 int,
c21 int,
c22 int,
c23 int,
c24 int,
c25 int,
c26 int,
c27 int,
c28 int,
c29 int,
c30 int,
c31 int,
c32 int,
c33 int,
c34 int,
c35 int,
c36 int,
c37 int,
c38 int,
c39 int,
c40 int,
c41 int,
c42 int,
c43 int,
c44 int,
c45 int,
c46 int,
c47 int,
c48 int,
c49 int
)
insert #t1 (id, c11) values (1, 1)
insert #t1 (id, c12) values (1, 1)
insert #t1 (id, c11) values (2, 1)
insert #t1 (id, c11) values (3, 1)
insert #t1 (id, c21) values (4, 1)
insert #t1 (id, c32) values (4, 1)
That's what I do: I have a look at the data "with problems", that is data with differnt values in some cells. This task is trivial when there are very few columns. When there are hundrets of columns (and not nearly everything is NULL like in this example) this a a demanding task.
select a.* from #t1 a
inner join (select id from #t1 group by id having count(*) > 1) b
on a.id = b.id
order by id
I would like to see the name of the cells that differ. In this example for example a resultset like this one would do.
id columnname
1 c11
1 c12
4 c21
4 c32
Yellowfilled cells in SSMS would also be fine...
Upvotes: 2
Views: 252
Reputation: 6677
Use UNPIVOT to convert the table to one row per column, and compare matching columns:
SELECT tu1.cell_name, tu1.cell_value, tu2.cell_value
FROM (SELECT cell_name, cell_value
FROM #t1
UNPIVOT (cell_value FOR cell_name IN c1, c2, c3)) tu1
JOIN (SELECT cell_name, cell_value
FROM #t2
UNPIVOT (cell_value FOR cell_name IN c1, c2, c3)) tu2
ON (tu1.cell_name = tu2.cell_name AND tu1.cell_value <> tu2.cell_value);
Untested, SQL Fiddle is acting up.
If you don't know the columns in advance, see https://stackoverflow.com/a/13377114/108326 for how to call UNPIVOT
dynamically. This approach retrieves the column list from database metadata, then generates an UNPIVOT
query on the fly.
Upvotes: 0
Reputation: 81970
The following will "dynamically" unpivot your data by using a little XML.
UnPivot would be more performant, but at least here you don't have to specify all the fields
Example
;with cte as (
Select A.ID
,C.*
,Fields = sum(1) over (Partition by A.ID)
,Vals = sum(1) over (Partition by A.ID,Field,Value)
From #t1 A
Cross Apply ( values (cast((Select A.* for XML RAW) as xml))) B(XMLData)
Cross Apply (
Select Field = a.value('local-name(.)','varchar(100)')
,Value = a.value('.','varchar(max)')
From B.XMLData.nodes('/row') as C1(n)
Cross Apply C1.n.nodes('./@*') as C2(a)
Where a.value('local-name(.)','varchar(100)') not in ('id','OtherExcludeColumns')
) C
)
Select ID
,Field
From cte
Where Fields>1 and Vals=1
Returns
ID Field
1 c11
1 c12
4 c21
4 c32
EDIT
A shortcut just to find rows which have different values
Select A.ID
From #T1 A
Cross Apply ( values ((Select A.* for XML RAW) )) B(XMLData)
Group by A.ID
Having min(XMLData)<>max(XMLData)
Returns
ID
1
4
Upvotes: 2