Reputation: 19
We have 2 tables called : "post" and " post_extra"
summery construction of "post" table's are: id,postdate,title,description
And for post_extra they are: eid,news_id,rating,views
"id" filed in the first table is related to "news_id" to the second table.
There are more than 100,000 records on the table, that many of them are duplicated. I want to keep only one record and remove duplicate records on "post" table that have the same title, and then remove the related record on "post_extra"
I ran this query on phpmyadmin but the server was crashed. And I had to restart it.
DELETE e
FROM Post p1, Post p2, Post_extra e
WHERE p1.postdate > p2.postdate
AND p1.title = p2.title
AND e.news_id = p1.id
How can I do this?
Upvotes: 0
Views: 189
Reputation: 3290
Suppose you have table named as 'tables' in which you have the duplicate records. Firstly you have to do group by column on which you want to delete duplicate.But I am not doing it with group by.I am writing self join instead of writing nested query or creating temporary table.
SELECT * FROM `names` GROUP BY title, id having count(title) > 1;
This query return number of duplicate records with their title and id.
You don't need to create the temporary table in this case.
To Delete duplicate except one record: In this table it should have auto increment column. The possible solution that I've just come across:
DELETE t1 FROM tables t1, tables t2 WHERE t1.id > t2.id AND t1.title = t2.title
if you want to keep the row with the lowest auto increment id value OR
DELETE t1 FROM tables t1, tables t2 WHERE t1.id < t2.id AND t1.title = n2.title
if you want to keep the row with the highest auto increment id value.
You can cross check your solution,by selecting the duplicate records again by given query:
SELECT * FROM `tables` GROUP BY title, id having count(title) > 1;
If it return 0 result, then you query is successful.
Upvotes: 2
Reputation: 204904
This will keep entries with the lowest id
for each title
DELETE p, e
FROM Post p
left join Post_extra e on e.news_id = p.id
where id not in
(
select * from
(
select min(id)
from post
group by title
) x
)
Upvotes: 0
Reputation: 438
You can delete duplicate record by creating a temporary table with unique index on the fields that you need to check for the duplicate value
then issue
Insert IGNORE into select * from TableWithDuplicates
You will get a temporary table without duplicates .
then delete the records from the original table (TableWithDuplicates) by JOIN the tables
Should be something like
CREATE TEMPORARY TABLE `tmp_post` (
`id` INT(10) NULL,
`postDate` DATE NULL,
`title` VARCHAR(50) NULL,
`description` VARCHAR(50) NULL, UNIQUE INDEX `postDate_title_description` (`postDate`, `title`, `description`) );
INSERT IGNORE INTO tmp_post
SELECT id,postDate,title,description
FROM post ;
DELETE post.*
FROM post
LEFT JOIN tmp_post tmp ON tmp.id = post.id
WHERE tmp.id IS NULL ;
Sorry I didn't tested this code
Upvotes: 0