Reputation: 800
I have a table that has 0.5 to 1 million records approx. The table has duplicate records by a varchar field.
Something like
ID int(11)
TITLE varchar(100)
There are many records that have duplicate TITLE values.
I need to remove the duplicate records. The first step is to find them. I suppose there are two methods:
To use GROUP BY TITLE HAVING COUNT(*) > 1
Go record by record using SELECT ID, TITLE FROM MY_TABLE
and then for each record use a SELECT ID FROM MY_TABLE WHERE TITLE='".$row['TITLE']." AND ID <> '".$row['ID']."'
The question is that which method is efficient?
Or is there another more efficient method?
I am using php as the base language.
Upvotes: 1
Views: 130
Reputation: 13519
In order to delete all duplicate records (having same title) you can use this query:
DELETE B
FROM your_table A
INNER JOIN your_table B ON A.TITLE = B.TITLE AND A.ID < B.ID.
This query will delete all the latter duplicate records while keeping the first one.
In order to keep the latest while deleting all the previous duplicate records:
DELETE A
FROM your_table A
INNER JOIN your_table B ON A.TITLE = B.TITLE AND A.ID < B.ID.
EDIT:
If you want to see which latter (duplicate) records will be deleted before executing the delete operation run the following query:
SELECT B.*
FROM your_table A
INNER JOIN your_table B ON A.TITLE = B.TITLE AND A.ID < B.ID.
This assumes that all NEW ID's will be greater than than prior ID's. A pretty safe assumption but caution is warranted
Thanks @xQbert for this addition.
Upvotes: 6
Reputation: 2168
Use the below sql
DELETE FROM MY_TABLE WHERE ID NOT IN (SELECT MIN(x.ID) FROM MY_TABLE x GROUP BY x.TITLE);
Upvotes: 0