Update duplicate email addresses on mysql database table

Question

I have a huge database that I have almost over 10k row in my user table and there are 2700 duplicate email addresses.

Basically the application did not limit the users from registering their accounts with the same email address over and over again. I have cleaned the multiple ones -more than 2 times- manually, there weren't many, but there are 2700 email addresses with duplicate value occur at least 2 times. So I want to update the duplicate email addresses and change the email address with a smaller id number to something like from "email@mail.com" to "1email@mail.com", basically adding "1" to the beginning of all duplicate email addresses. I can select and display the duplicate email addresses but could not find the way to update only one of the email addresses and leave the other on untouched.

My table structure is like id username email password.

Booboo · Accepted Answer

If you do not have MySQL 8:

Here I am just prepending the id of the row to the email address:

UPDATE my_table JOIN (
    SELECT email, MAX(id) AS max_id, COUNT(*) AS cnt FROM my_table
    GROUP BY email
    HAVING cnt > 1
) sq ON my_table.email = sq.email AND my_table.id <> sq.max_id
SET my_table.email = CONCAT( my_table.id, my_table.email)
;

See DB-Fiddle

The inner query:

SELECT email, MAX(id) AS max_id, COUNT(*) AS cnt FROM my_table
GROUP BY email
HAVING cnt > 1

looks for all emails that that are duplicated (i.e. there is more than one row with the same email address) and computes the row that has the maximum id value for each email address. For the sample data in my DB-Fiddle demo, it would return the following:

| email            | max_id | cnt |
| ---------------- | ------ | --- |
| emaila@dummy.com | 3      | 3   |
| emailb@dummy.com | 5      | 2   |

The above inner query is aliased as table sq.

Now if I join my_table with the above query as follows:

SELECT my_table.* from my_table join (
  SELECT email, MAX(id) AS max_id, COUNT(*) AS cnt FROM my_table
    GROUP BY email
    HAVING cnt > 1
) sq on my_table.email = sq.email and my_table.id <> sq.max_id

I get:

| id  | email            |
| --- | ---------------- |
| 1   | emaila@dummy.com |
| 2   | emaila@dummy.com |
| 4   | emailb@dummy.com |

because I am selecting from my_table all rows that have duplicate email addresses (condition my_table.email = sq.email except for the rows that have the highest value of id for each email address (condition my_table.id <> sq.max_id).

It is the ids from the above join whose email addresses are to be modified.

Update duplicate email addresses on mysql database table

Answers (2)

Related Questions