Jean R.
Jean R.

Reputation: 544

SQL - Merge duplicate rows by the most recent

I have a big SQL database with these tables for example:

first_name | last_name | email             | country | created_at
-----------------------------------------------------------------
      john | DOE       | [email protected] | USA     | 2016-05-01
      john | DOE       | [email protected] | FRANCE  | 2019-05-03
       doe | John      | [email protected] | CANADA  | 2011-08-23

The previous database was built without a unique email (yes it's horrible). So, I need to merge the user with same email but different data with the most recent record.

Then update the database by deleting the older one and keep the latest one.

Excuse me if it's not clear..

Upvotes: 0

Views: 532

Answers (3)

Morten
Morten

Reputation: 414

You mentioned that this is a big database. I will then suggest that you add an index on the table before running the script by either @forpas or @Gordon Linoff as these scripts might take a long time to complete when dealing with millions of rows.

The index could be created like this:

CREATE INDEX tablename_index ON tablename (email, created_at);

And then afterwards, if you don't need the index any more, you can drop it like this:

DROP INDEX tablename_index ON tablename;

Upvotes: 0

forpas
forpas

Reputation: 164174

With EXISTS:

delete tablename t
where exists (
  select 1 from tablename where email = t.email and created_at > t.created_at
)

EXISTS will return TRUE as soon as it finds 1 row with the same email and date greater than the current row, so it does not need to scan the whole table for every row.

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1270703

Something like this?

delete t
    where t.created_at < (select max(t2. created_at)
                          from t t2
                          where t2.email = t.email
                         );

Upvotes: 4

Related Questions