Imran Ahmed
Imran Ahmed

Reputation: 800

Find and delete latter duplicate records

I have a table that has 0.5 to 1 million records approx. The table has duplicate records by a varchar field.

Something like

ID int(11)
TITLE varchar(100)

There are many records that have duplicate TITLE values.

I need to remove the duplicate records. The first step is to find them. I suppose there are two methods:

  1. To use GROUP BY TITLE HAVING COUNT(*) > 1

  2. Go record by record using SELECT ID, TITLE FROM MY_TABLE and then for each record use a SELECT ID FROM MY_TABLE WHERE TITLE='".$row['TITLE']." AND ID <> '".$row['ID']."'

The question is that which method is efficient?

Or is there another more efficient method?

I am using php as the base language.

Upvotes: 1

Views: 130

Answers (2)

1000111
1000111

Reputation: 13519

In order to delete all duplicate records (having same title) you can use this query:

DELETE B
FROM your_table A 
INNER JOIN your_table B ON A.TITLE = B.TITLE AND A.ID < B.ID.

This query will delete all the latter duplicate records while keeping the first one.


In order to keep the latest while deleting all the previous duplicate records:

DELETE A
FROM your_table A 
INNER JOIN your_table B ON A.TITLE = B.TITLE AND A.ID < B.ID.

EDIT:

If you want to see which latter (duplicate) records will be deleted before executing the delete operation run the following query:

SELECT B.*
FROM your_table A 
INNER JOIN your_table B ON A.TITLE = B.TITLE AND A.ID < B.ID.

This assumes that all NEW ID's will be greater than than prior ID's. A pretty safe assumption but caution is warranted

Thanks @xQbert for this addition.

Upvotes: 6

Naga
Naga

Reputation: 2168

Use the below sql

DELETE FROM MY_TABLE WHERE ID NOT IN (SELECT MIN(x.ID) FROM MY_TABLE x GROUP BY x.TITLE);

Upvotes: 0

Related Questions