Reputation: 1491
I'm looking for a good way to do the following:
We have a table Accounts
where some records have accidentally been imported twice, so they are duplicates. I have found that I can select all rows which are imported twice with the following query:
select name, vatnumber from Accounts
WHERE IsDeleted='false'
GROUP BY name, vatnumber
HAVING count(*) > 1
Basically: if the name
and the vatnumber
are the same, they are duplicates.
This gives me all the records that have been imported twice. Now I'm looking for a way to get the first Id of every double record so I can do something like:
UPDATE Accounts SET IsDeleted='true'
WHERE Id = (select id ...)
So basically, I'm trying to update the first row of every double record, so there are no more doubles. Can anyone please point me in the right direction? I haven't got the faintest clue on how to start doing this besides doing manual labour and I'm guessing there is a much easier way.
Sample data:
Id VatNumber Name
1 BE10128292 Microsoft
2 BE99292200 Google
3 BE10128292 Microsoft
4 BE99292200 Some other company
5 BE99292200 Google
Desired result:
Id VatNumber Name
1 BE10128292 Microsoft
2 BE99292200 Google
It doesn't matter whether I get the first Microsoft
or the last Microsoft
record really. Ideally, it would be the first though.
Upvotes: 0
Views: 2724
Reputation: 105
Try this,
;WITH removeDup as
(
SELECT *,ROW_NUMBER() OVER(PARTITION BY vatnumber,name ORDER BY ID ) DupId
from Accounts
)
DELETE from removeDup where DupId=2
Upvotes: 0
Reputation: 19
Try CTE
as below :
WITH CTE
AS (
SELECT vatnumber,
name,
ROW_NUMBER() OVER(PARTITION BY vatnumber,
name ORDER BY ID) rowid,
IsDeleted
FROM Accounts
WHERE IsDeleted = 'false')
UPDATE CTE
SET
IsDeleted = 'true'
WHERE rowid > 1;
Upvotes: 1
Reputation: 50163
Try below using CTE
:
;WITH cte
AS (
SELECT *,
ROW_NUMBER() OVER(PARTITION BY vatnumber ORDER BY id) rn
FROM <tablename>)
SELECT Id,
VatNumber,
Name
FROM cte
WHERE rn = 1;
Desired Result :
Id VatNumber Name
1 BE10128292 Microsoft
2 BE99292200 Google
Upvotes: 2
Reputation: 3096
Check This.
with CTE as
(
select *,ROW_NUMBER()over(partition by vatnumber,name order by ID )rowid
from #tableName
)
select * from CTE where rowid='2' // here you can change RowID 2 or 1
Check Demo Here
Upvotes: 1
Reputation: 22733
This should do the job, run the sample and check the output. It Updates your IsDeleted
flag where your duplicates are found but uses the MIN(ID)
to target the first row only.
CREATE TABLE #dupes
(
id INT,
vatNo NVARCHAR(20),
name NVARCHAR(20),
isDeleted BIT
DEFAULT 0
);
INSERT INTO #dupes
(
id,
vatNo,
name
)
VALUES
(1, 'BE10128292', 'Microsoft'),
(2, 'BE99292200', 'Google'),
(3, 'BE10128292', 'Microsoft'),
(4, 'BE99292200', 'Some other company'),
(5, 'BE99292200', 'Google');
UPDATE #dupes
SET isDeleted = 1
WHERE id IN (
SELECT MIN(id) MinId
FROM #dupes
WHERE isDeleted = 0
GROUP BY name,
vatNo
HAVING COUNT(*) > 1
);
SELECT *
FROM #dupes AS d;
DROP TABLE #dupes;
Produces:
id vatNo name isDeleted
1 BE10128292 Microsoft 1
2 BE99292200 Google 1
3 BE10128292 Microsoft 0
4 BE99292200 Some other company 0
5 BE99292200 Google 0
So specific to your database, the query would be:
UPDATE Accounts
SET isDeleted = 1
WHERE Id IN (
SELECT MIN(id) MinId
FROM Accounts
WHERE isDeleted = 0
GROUP BY name,
vatNo
HAVING COUNT(*) > 1
);
Upvotes: 3