Merge Duplicate Rows in MySQL

Question

I have a database like this:

users
id    name    email                phone
1     bill    bill@fakeemail.com
2     bill    bill@fakeemail.com   123456789
3     susan   susan@fakeemail.com
4     john    john@fakeemail.com   123456789
5     john    john@fakeemail.com   987654321

I want to merge records considered duplicates based on the email field.

Trying to figure out how to use the following considerations.

Merge based on duplicate email
If one row has a null value use the row that has the most data.
If 2 rows are duplicates but other fields are different then use the one

with the highest id number (see the john@fakeemail.com row for an example.)

Here is a query I tried:

DELETE FROM users WHERE users.id NOT IN 
(SELECT grouped.id FROM (SELECT DISTINCT ON (email) * FROM users) AS grouped)

Getting a syntax error.

I'm trying to get the database to transform to this, I can't figure out the correct query:

users
id   name    email                 phone
2    bill    bill@fakeemail.com    123456789
3    susan   susan@fakeemail.com   
5    john    john@fakeemail.com    987654321

Tim Biegeleisen · Accepted Answer

Here is one option using a delete join:

DELETE
FROM users
WHERE id NOT IN (SELECT id
                 FROM (
                     SELECT CASE WHEN COUNT(*) = 1
                                 THEN MAX(id)
                                 ELSE MAX(CASE WHEN phone IS NOT NULL THEN id END) END AS id
                     FROM users
                     GROUP BY email) t);

The logic of this delete is as follows:

Emails where there is only one record are not deleted
For emails with two or more records, we delete everything except for the record having the highest id value, where the phone is also defined.

Merge Duplicate Rows in MySQL

Answers (2)

Related Questions