Steven Lemmens
Steven Lemmens

Reputation: 1491

How to find and update the first row of duplicated rows with SQL

I'm looking for a good way to do the following:

We have a table Accounts where some records have accidentally been imported twice, so they are duplicates. I have found that I can select all rows which are imported twice with the following query:

select name, vatnumber from Accounts 
WHERE IsDeleted='false'
GROUP BY name, vatnumber
HAVING count(*) > 1

Basically: if the name and the vatnumber are the same, they are duplicates.

This gives me all the records that have been imported twice. Now I'm looking for a way to get the first Id of every double record so I can do something like:

UPDATE Accounts SET IsDeleted='true'
WHERE Id = (select id ...) 

So basically, I'm trying to update the first row of every double record, so there are no more doubles. Can anyone please point me in the right direction? I haven't got the faintest clue on how to start doing this besides doing manual labour and I'm guessing there is a much easier way.

Sample data:

Id     VatNumber     Name
 1     BE10128292    Microsoft
 2     BE99292200    Google
 3     BE10128292    Microsoft
 4     BE99292200    Some other company
 5     BE99292200    Google

Desired result:

Id     VatNumber     Name
 1     BE10128292    Microsoft
 2     BE99292200    Google

It doesn't matter whether I get the first Microsoft or the last Microsoft record really. Ideally, it would be the first though.

Upvotes: 0

Views: 2724

Answers (5)

pankaj singh
pankaj singh

Reputation: 105

Try this,

;WITH  removeDup  as

( 

SELECT *,ROW_NUMBER() OVER(PARTITION BY vatnumber,name ORDER BY ID ) DupId  

from Accounts

)

DELETE from removeDup where DupId=2 

Upvotes: 0

Arun Udumbassery
Arun Udumbassery

Reputation: 19

Try CTE as below :

WITH CTE
         AS (
         SELECT vatnumber,
                name,
                ROW_NUMBER() OVER(PARTITION BY vatnumber,
                                               name ORDER BY ID) rowid,
                IsDeleted
         FROM Accounts
         WHERE IsDeleted = 'false')
         UPDATE CTE
           SET
               IsDeleted = 'true'
         WHERE rowid > 1; 

Upvotes: 1

Yogesh Sharma
Yogesh Sharma

Reputation: 50163

Try below using CTE :

;WITH cte
     AS (
     SELECT *,
            ROW_NUMBER() OVER(PARTITION BY vatnumber ORDER BY id) rn
     FROM <tablename>)
     SELECT Id,
            VatNumber,
            Name
     FROM cte
     WHERE rn = 1;

Desired Result :

Id     VatNumber     Name
 1     BE10128292    Microsoft
 2     BE99292200    Google

Upvotes: 2

Mr. Bhosale
Mr. Bhosale

Reputation: 3096

Check This.

with  CTE  as
( 
select *,ROW_NUMBER()over(partition by vatnumber,name order by ID )rowid  
from #tableName
)
select * from CTE where rowid='2' // here you can change RowID 2 or 1

Check Demo Here

Upvotes: 1

Tanner
Tanner

Reputation: 22733

This should do the job, run the sample and check the output. It Updates your IsDeleted flag where your duplicates are found but uses the MIN(ID) to target the first row only.

CREATE TABLE #dupes
(
    id INT,
    vatNo NVARCHAR(20),
    name NVARCHAR(20),
    isDeleted BIT
        DEFAULT 0
);

INSERT INTO #dupes
(
    id,
    vatNo,
    name
)
VALUES
(1, 'BE10128292', 'Microsoft'),
(2, 'BE99292200', 'Google'),
(3, 'BE10128292', 'Microsoft'),
(4, 'BE99292200', 'Some other company'),
(5, 'BE99292200', 'Google');


UPDATE #dupes
SET isDeleted = 1
WHERE id IN (
                SELECT MIN(id) MinId
                FROM #dupes
                WHERE isDeleted = 0
                GROUP BY name,
                         vatNo
                HAVING COUNT(*) > 1
            );

SELECT *
FROM #dupes AS d;

DROP TABLE #dupes;

Produces:

id  vatNo       name                isDeleted
1   BE10128292  Microsoft           1
2   BE99292200  Google              1
3   BE10128292  Microsoft           0
4   BE99292200  Some other company  0
5   BE99292200  Google              0

So specific to your database, the query would be:

UPDATE Accounts 
SET isDeleted = 1
WHERE Id IN (
                SELECT MIN(id) MinId
                FROM Accounts
                WHERE isDeleted = 0
                GROUP BY name,
                         vatNo
                HAVING COUNT(*) > 1
            );

Upvotes: 3

Related Questions