A query to find if any two fields in a row are equal?

Question

I have to maintain a scary legacy database that is very poorly designed. All the tables have more than 100 columns - one has 650. The database is very denormalized and I have found that often the same data is expressed in several columns in the same row.

For instance, here is a sample of columns for one of the tables:

[MEMBERADDRESS] [varchar](331) NULL,
[DISPLAYADDRESS] [varchar](max) NULL,

[MEMBERINLINEADDRESS] [varchar](max) NULL,
[DISPLAYINLINEADDRESS] [varchar](250) NULL,

[__HISTISDN] [varchar](25) NULL,
[HISTISDN] [varchar](25) NULL,
[MYDIRECTISDN] [varchar](25) NULL,
[MYISDN] [varchar](25) NULL,

[__HISTALT_PHONE] [varchar](25) NULL,
[HISTALT_PHONE] [varchar](25) NULL,

It turns out that MEMBERADDRESS and DISPLAYADDRESS have the same value for all rows in the table. The same is true for the other clusters of fields I have shown here.

It will be very difficult and time consuming to identify all cases like this manually. Is it possible to create a query that would identify if two fields have the same value in every row in a table?

If not, are there any existing tools that will help me identify these sorts of problems?

Gordon Linoff · Accepted Answer

The following approach uses unpivot to create triples. It makes some assumptions: values are not null; each row has an id; and columns have compatible types.

select t.which, t2.which 
from (select id, which, value
      from MEMBERADDRESS
      unpivot (value for which in ()) up
     ) t full outer join
     (select id, which, value
      from MEMBERADDRESS
      unpivot (value for which in ()) up
     ) t2
     on t.id = t2.id and t.which <> t2.which
group by t.which, t2.which
having sum(case when t.value = t2.value then 1 else 0 end) = count(*)

It works by creating a new table with three columns: id, which column, and the value in the column. It then does a self join on id (to keep comparisons within one row) and value (to get matching values). This self-join should always match, because the columns are the same in the two halves of the query.

The having then counts the number of values that are the same on both sides for a given pair of columns. When all these are the same, then the match is successful.

You can also leave out the having clause and use something like:

select t.which, t2.which, sum(case when t.value = t2.value then 1 else 0 end) as Nummatchs,
       count(*) as NumRows

To get more complete information.

A query to find if any two fields in a row are equal?

Answers (2)

Related Questions