Reactor
Reactor

Reputation: 19

delete duplicate records in mysql

We have 2 tables called : "post" and " post_extra"

summery construction of "post" table's are: id,postdate,title,description

And for post_extra they are: eid,news_id,rating,views

"id" filed in the first table is related to "news_id" to the second table.

There are more than 100,000 records on the table, that many of them are duplicated. I want to keep only one record and remove duplicate records on "post" table that have the same title, and then remove the related record on "post_extra"

I ran this query on phpmyadmin but the server was crashed. And I had to restart it.

DELETE e 
FROM Post p1, Post p2, Post_extra e 
WHERE p1.postdate > p2.postdate 
  AND p1.title = p2.title 
  AND e.news_id = p1.id

How can I do this?

Upvotes: 0

Views: 189

Answers (3)

Aman Garg
Aman Garg

Reputation: 3290

Suppose you have table named as 'tables' in which you have the duplicate records. Firstly you have to do group by column on which you want to delete duplicate.But I am not doing it with group by.I am writing self join instead of writing nested query or creating temporary table.

SELECT * FROM `names` GROUP BY title, id having count(title) > 1;

This query return number of duplicate records with their title and id.

You don't need to create the temporary table in this case.

To Delete duplicate except one record: In this table it should have auto increment column. The possible solution that I've just come across:

DELETE t1 FROM tables t1, tables t2 WHERE t1.id > t2.id AND t1.title = t2.title

if you want to keep the row with the lowest auto increment id value OR

DELETE t1 FROM tables t1, tables t2 WHERE t1.id < t2.id AND t1.title = n2.title

if you want to keep the row with the highest auto increment id value.

You can cross check your solution,by selecting the duplicate records again by given query:

SELECT * FROM `tables` GROUP BY title, id having count(title) > 1;

If it return 0 result, then you query is successful.

Upvotes: 2

juergen d
juergen d

Reputation: 204904

This will keep entries with the lowest id for each title

DELETE p, e
FROM Post p
left join Post_extra e on e.news_id = p.id
where id not in
(
   select * from 
   (
      select min(id) 
      from post
      group by title
   ) x
)

SQLFiddle demo

Upvotes: 0

Biju Soman
Biju Soman

Reputation: 438

You can delete duplicate record by creating a temporary table with unique index on the fields that you need to check for the duplicate value

then issue

Insert IGNORE into select * from TableWithDuplicates

You will get a temporary table without duplicates .

then delete the records from the original table (TableWithDuplicates) by JOIN the tables

Should be something like

CREATE TEMPORARY TABLE `tmp_post` (
`id` INT(10) NULL,
`postDate` DATE NULL,
`title` VARCHAR(50) NULL,
`description` VARCHAR(50) NULL, UNIQUE INDEX `postDate_title_description`        (`postDate`, `title`, `description`) );

INSERT IGNORE INTO tmp_post
SELECT id,postDate,title,description
FROM post ;

DELETE post.*
FROM post
LEFT JOIN tmp_post tmp ON tmp.id = post.id
WHERE tmp.id IS NULL ;

Sorry I didn't tested this code

Upvotes: 0

Related Questions