Reputation: 13
I have tried looking around for this issue and couldn't find much. I was wondering the best method of removing duplicate parent/child rows in an SQL table.
For example I have a table like so:
Id | CompanyId | DuplicateId
1 | 1 | 2
2 | 2 | 1 <------ CompanyId 2 is already a duplicate of 1
3 | 2 | 3
I wish to select all non duplicates:
Id | CompanyId | DuplicateId
1 | 1 | 2
3 | 2 | 3
Any help or pointing in the right direction would be great. Thanks!
Edit: I'm using Microsoft SQL Server
Upvotes: 1
Views: 831
Reputation: 555
Another way in Sql Server to remove dups parent child could be this:
delete from your_table where concat(companyid,duplicateid)in
(
SELECT concat(table1.companyid , table1.duplicateid )
FROM your_table table1
inner join your_table table2
ON table1.companyid = table2.duplicateid
AND table1.duplicateid = table2.companyid and table1.companyid>table1.duplicateid
)
Hope it helps.
Upvotes: 0
Reputation: 388
Try this
SELECT * FROM tablename t1 where CompanyId not IN (SELECT Duplicateid from
tablename WHERE CompnayId<>t1.CompanyId and duplicateid > companyid)
Upvotes: 0
Reputation: 1270391
Here is one method, that works in most databases:
delete from sqltable
where duplicateid > companyid and
exists (select 1
from sqltable st2
where st2.duplicateid = sqltable.companyid and
st2.companyid = sqltable.duplicateid
);
Upvotes: 1