Reputation: 163
I have a SQL Server database with a unique key column and 49 columns of data elements (name/address/etc......). I have "duplicate" entries but with different keys and I want to find those duplicates entries.
As an example, I may have "John Smith" (with 47 other columns of information) in the table twice. Both John Smith entries will have a different unique key column, but other than that, all other columns would be identical. Including if one off the columns is NULL, then it will be NULL for both John Smith entries.
To complicate things, there are two tables which I need to join together, then once joined find any entries where data elements (everything except for the key) is the same.
Table1 layout
MyKey, table2ID, Col1, Col2, Col3....Col46.
Table2 layout
ID, col47, col48, col49
Col1 through to Col49 is where the "duplicate" data could be.
I have tried something like the below, which almost works. It fails if I have NULL values. For example, if Col22 is NULL on both John Smith entries (ie they are both the same NULL value) then they are not picked up in the selection.
Question: how do I get something like the below to work even when there are NULL values that need to be compared against each other.
with MyJoinedTable as
(
select PolicyNumber, col01, col02, col03......col49
from table1
inner join table2 on table2id = table2.id
)
select PolicyNumber, t1.col01, t1.col02, t1.col03.......t1.col49
from MyJoinedTable t1
inner join (select col01, col02, col03......col49
from MyJoinedTable
group by col01, col02, col03......col49
having count(*) > 1) t2
on t1.col01 = t2.col01
and t1.col02 = t2.col02
.......
and t1.col49 = t2.col49
order by t1.col01, t1.col02
Upvotes: 0
Views: 375
Reputation: 1269693
One method is:
select t.*
from t
where exists (select 1
from t t2
where t2.col1 = t.col1 and
t2.col2 = t.col2 and
. . .
t2.policyNumber <> t.policyNumber
);
This works assuming that none of the other columns are NULL
.
EDIT:
If you are using SQL Server, I would just do:
select t.*
from (select t.*,
min(id) over (partition by col1, col2, . . . ) as min_id,
max(id) over (partition by col1, col2, . . . ) as max_id
from t
) t
where minid <> maxid;
Upvotes: 0
Reputation: 37472
Group in a subquery with HAVING count(*) > 1
and join it back in.
SELECT to1.policynumber,
to1.col1,
...
to1.col49
FROM elbat to1
INNER JOIN (SELECT ti.col1,
...
ti.col49
FROM elbat ti
GROUP BY col1,
...
col49
HAVING count(*) > 1) to2
ON to2.col1 = to1.col1
...
AND to2.col49 = to1.col49;
Or use an EXISTS
.
SELECT to.policynumber,
to.col1,
...
to.col49
FROM elbat to
WHERE EXISTS (SELECT *
FROM elbat ti
WHERE ti.policynumber <> to.policynumber
AND ti.col1 = to.col1
...
AND ti.col49 = to.col49);
Upvotes: 0