Bono
Bono

Reputation: 4869

Remove duplicates from table

I have a table with only a primary key column and a text column. The text column has duplicate values and I want those gone.

What have I tried


I googled around a bit and quickly found what I thought was the answer, which was this:

ALTER IGNORE TABLE tablename ADD UNIQUE INDEX index_name (column1);

But after trying to execute the query I ended up with MySQL saying: "#1062 - Duplicate entry 'v&d' for key 'remove_duplicates'". So after fiddeling with that for a while I found that it didn't work.

After that I tried creating a tmp table and removing the old one, but I couldn't get that going either. I may have gotten the syntax wrong which was:

CREATE table `tmp` like `Tag`

alter table tmp add unique (text)

INSERT IGNORE INTO `tmp` SELECT * FROM `Tag`

RENAME TABLE `Tag` TO `deleteme`
RENAME TABLE `tmp` TO `Tag`

DROP TABLE `deleteme`;



What do I want


A table that has no duplicate values for column 'text'. If anyone sees any errors with my previous methods please let me know, or if you think it should/could be done in a different way please let me know!



Edit


I forgot to mention that I also have a relation hanging on the PK (yeah, quite important I know). Is there some way to "preserve" the relation with the other table as well? I could manually change the id's in the other table if need be, but a way to change that as well would be great.

Upvotes: 1

Views: 242

Answers (2)

Raymond Nijland
Raymond Nijland

Reputation: 11602

This is how i would done it on a pretty large table i assume you have a column "id"

ALTER TABLE Tag ADD UNIQUE INDEX text_id (text, id);

Create a UNIQUE INDEX on column1. id so the next querys should run faster.

Then if you would like to know how many duplicates you had in the table Tag

SELECT COUNT(*) as "total" - COUNT(DISTINCT text) as "duplicates text" FROM Tag

To get the first row that is UNIQUE you should run (if FIFO is imporant)

INSERT INTO `tmp` SELECT MIN(id), text FROM `Tag` GROUP BY text (Gordon Linoff query)

To get the last row that is UNIQUE you should run (if LIFO is imporant)

INSERT INTO `tmp` SELECT MAX(id), text FROM `Tag` GROUP BY text 

Because off the covering index the copy should be pretty quick, if the server doesn't need to create a disk temporary table...

RENAME TABLE `Tag` TO `deleteme`
RENAME TABLE `tmp` TO `Tag`

DROP TABLE `deleteme`;

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1271003

When you insert into the table, remove the duplicates then:

CREATE table `tmp` like `Tag`

alter table tmp add unique (text)

INSERT INTO `tmp` SELECT min(pk), text FROM `Tag` group by text;

RENAME TABLE `Tag` TO `deleteme`
RENAME TABLE `tmp` TO `Tag`

DROP TABLE `deleteme`;

Upvotes: 10

Related Questions