Find and delete latter duplicate records

Question

I have a table that has 0.5 to 1 million records approx. The table has duplicate records by a varchar field.

Something like

ID int(11)
TITLE varchar(100)

There are many records that have duplicate TITLE values.

I need to remove the duplicate records. The first step is to find them. I suppose there are two methods:

To use GROUP BY TITLE HAVING COUNT(*) > 1
Go record by record using SELECT ID, TITLE FROM MY_TABLE and then for each record use a SELECT ID FROM MY_TABLE WHERE TITLE='".$row['TITLE']." AND ID <> '".$row['ID']."'

The question is that which method is efficient?

Or is there another more efficient method?

I am using php as the base language.

1000111 · Accepted Answer

In order to delete all duplicate records (having same title) you can use this query:

DELETE B
FROM your_table A 
INNER JOIN your_table B ON A.TITLE = B.TITLE AND A.ID < B.ID.

This query will delete all the latter duplicate records while keeping the first one.

In order to keep the latest while deleting all the previous duplicate records:

DELETE A
FROM your_table A 
INNER JOIN your_table B ON A.TITLE = B.TITLE AND A.ID < B.ID.

EDIT:

If you want to see which latter (duplicate) records will be deleted before executing the delete operation run the following query:

SELECT B.*
FROM your_table A 
INNER JOIN your_table B ON A.TITLE = B.TITLE AND A.ID < B.ID.

This assumes that all NEW ID's will be greater than than prior ID's. A pretty safe assumption but caution is warranted

Thanks @xQbert for this addition.

Answers (2)