Reputation: 670

Remove dulpicate rows from table in faster way

What is the best way to remove the dulpicate rows from a large table(500000+ rows).
I had a code it works well but it is not fast enough.
Here is a code.

DELETE foo
FROM foo
  INNER JOIN (SELECT
        link,
        MIN(id) AS MinId
          FROM foo
          GROUP BY link) b
    ON foo.link = b.link
      AND foo.id != b.MinId

Please tell me anyother way which is faster than this code.
Thanks.......

Upvotes: 0

Answers (5)

Justin

Reputation: 9724

Query:

DELETE f
FROM foo f
WHERE (SELECT MIN(f1.id) 
       FROM (SELECT * FROM foo) f1
       WHERE f1.link = f.link) != f.id

Query to remove duplicate records which between in id=50 to id=100:

DELETE f
FROM foo f
WHERE (SELECT MIN(f1.id) 
       FROM (SELECT * FROM foo) f1
       WHERE f1.link = f.link) != f.id
AND f.id >= 50
AND f.id <= 100

Upvotes: 0

Sunny

Reputation: 99

delete from [table] where rowid not in (select min(rowid) from [table] group by [primary key columns])

It will help in removing the duplicacy from table based on rowids.

Upvotes: 1

sectus

Reputation: 15454

Short way: add constraint

ALTER IGNORE TABLE `foo` ADD UNIQUE `link` (`link`);
DROP INDEX `link` ON `foo`; -- to restore table state

But i think better to create new table with constraint

CREATE  TABLE `temp` LIKE `foo`;
ALTER IGNORE TABLE `temp` ADD UNIQUE `link` (`link`);
INSERT IGNORE INTO `temp` SELECT * FROM `foo`;
RENAME TABLE `foo` TO `old_foo`, `temp` TO `foo`;
DROP TABLE `old_foo`;
DROP INDEX `link` ON `foo`; -- to restore table state

Did not test it.

P.S. Do not forget to lock table

P.P.S. Table cannot be locked. Anyway you must stop writing to that table before manipulations.

Upvotes: 0

Muhammad Raheel

Reputation: 19882

Try this

DELETE foo
FROM foo
  INNER JOIN (SELECT
                link,
                MIN(id) AS MinId
          FROM foo
          GROUP BY link) b
    ON foo.link = b.link
WHERE foo.id <> b.MinId

Upvotes: 0

fthiella

Reputation: 49049

I would try this:

DELETE foo1
FROM foo foo1 INNER JOIN foo foo2
     ON foo1.link = foo2.link AND foo1.id>foo2.id

This will keep only the link with the minimum id.

Upvotes: 2

Remove dulpicate rows from table in faster way

Answers (5)

Related Questions