Reputation: 670
What is the best way to remove the dulpicate rows from a large table(500000+ rows).
I had a code it works well but it is not fast enough.
Here is a code.
DELETE foo
FROM foo
INNER JOIN (SELECT
link,
MIN(id) AS MinId
FROM foo
GROUP BY link) b
ON foo.link = b.link
AND foo.id != b.MinId
Please tell me anyother way which is faster than this code.
Thanks.......
Upvotes: 0
Views: 106
Reputation: 9724
Query:
DELETE f
FROM foo f
WHERE (SELECT MIN(f1.id)
FROM (SELECT * FROM foo) f1
WHERE f1.link = f.link) != f.id
Query to remove duplicate records which between in id=50 to id=100:
DELETE f
FROM foo f
WHERE (SELECT MIN(f1.id)
FROM (SELECT * FROM foo) f1
WHERE f1.link = f.link) != f.id
AND f.id >= 50
AND f.id <= 100
Upvotes: 0
Reputation: 99
delete from [table] where rowid not in (select min(rowid) from [table] group by [primary key columns])
It will help in removing the duplicacy from table based on rowids.
Upvotes: 1
Reputation: 15454
Short way: add constraint
ALTER IGNORE TABLE `foo` ADD UNIQUE `link` (`link`);
DROP INDEX `link` ON `foo`; -- to restore table state
But i think better to create new table with constraint
CREATE TABLE `temp` LIKE `foo`;
ALTER IGNORE TABLE `temp` ADD UNIQUE `link` (`link`);
INSERT IGNORE INTO `temp` SELECT * FROM `foo`;
RENAME TABLE `foo` TO `old_foo`, `temp` TO `foo`;
DROP TABLE `old_foo`;
DROP INDEX `link` ON `foo`; -- to restore table state
Did not test it.
P.S. Do not forget to lock table
P.P.S. Table cannot be locked. Anyway you must stop writing to that table before manipulations.
Upvotes: 0
Reputation: 19882
Try this
DELETE foo
FROM foo
INNER JOIN (SELECT
link,
MIN(id) AS MinId
FROM foo
GROUP BY link) b
ON foo.link = b.link
WHERE foo.id <> b.MinId
Upvotes: 0
Reputation: 49049
I would try this:
DELETE foo1
FROM foo foo1 INNER JOIN foo foo2
ON foo1.link = foo2.link AND foo1.id>foo2.id
This will keep only the link with the minimum id.
Upvotes: 2