Bean Huang
Bean Huang

Reputation: 159

Delete duplicate email records

We need to write a SQL query to delete all duplicate email entries in a table named Person, keeping only unique emails based on its smallest Id.

+----+------------------+
| Id | Email            |
+----+------------------+
| 1  | [email protected] |
| 2  | [email protected]  |
| 3  | [email protected] |
+----+------------------+

Id is the primary key column for this table.

For example, after running the query, the above Person table should have the following rows:

+----+------------------+
| Id | Email            |
+----+------------------+
| 1  | [email protected] |
| 2  | [email protected]  |
+----+------------------+

Note: the output is the whole Person table after executing the SQL. Need to use delete statement.

And here is my answer:

delete Person
from Person p
left join 
    (select Id, Email 
     from
         (select 
              Id, Email,
              ROW_NUMBER() OVER(PARTITION BY Email ORDER BY Id) AS row_num
          from Person) as temp
     where row_num = 1) as temptwo on p.Id = temptwo.Id
where 
    temptwo.Id is null

But after I run my query, it keeps telling me that there is a syntax error in this line:

ROW_NUMBER() OVER(PARTITION BY Email ORDER BY Id) AS row_num

Can someone tell me what's wrong?

Upvotes: 0

Views: 1976

Answers (3)

Thiyagu
Thiyagu

Reputation: 1330

Try This:

SELECT Main.ID,Main.EmaIl_ID FROM @Tab Main
INNER JOIN (
   SELECT MIN(ID) AS MIN_ID,Email_ID From @Tab
   GROUP BY Email_ID) SUB ON Main.ID=SUB.MIN_ID
ORDER BY Main.ID

Hope this Code works fine for your case.

Upvotes: 0

Hendra
Hendra

Reputation: 11

Try this in MySQL:

WITH cte AS (
SELECT p.*, ROW_NUMBER() OVER (PARTITION BY email ORDER BY id) AS seq
FROM Person p)                                               
DELETE FROM Person WHERE id IN (
SELECT id FROM cte
WHERE seq > 1);

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1269703

You can just use:

with todelete as (
      select p.*,
             ROW_NUMBER() OVER (PARTITION BY Email ORDER BY Id) AS seqnum
      from Person p
     ) 
delete from todelete
    where seqnum > 1;

That is, SQL Server supports updatable CTEs (and subqueries and views), so you don't need to calculate and join back to the table.

EDIT:

In MySQL, you would use:

delete p
    from p join
         (select email, min(id) as min_id
          from person pp
          group by email
         ) pp
         on p.email = pp.email and p.id > p.min_id;

Upvotes: 6

Related Questions