dotty
dotty

Reputation: 41523

Make MySQL table unique

Hay, I created a spider to crawl through a PDF document and log every word in the document into a table in a MySQL database.

Obviously words like 'the', 'and', 'or' etc appear in a book many, many times.

I'm just wondering what's the quickest method to remove dupe values from a table?

Upvotes: 0

Views: 323

Answers (5)

pritaeas
pritaeas

Reputation: 2093

delete from words where idcolumn not in
  (select min(idcolumn) 
   from words T2 
   where T2.plain = WordsTable.plain)

This works if you added (idcolumn, plain) for every word you found.

If you do not have an id column (pk) then you can use Anax's solution.

In addition to not inserting duplicates (codeburger comment), you can just set a unique index on your plain column.

Upvotes: 0

Mike A.
Mike A.

Reputation: 398

If you can rerun the script to populate the database, you could add a unique key on the "word" field and instead of INSERT INTO do a REPLACE INTO. This will delete the previous instance of the record before adding a duplicate field. This may not be the most efficient way to do it, but it's rather simple. See here for more details:

http://dev.mysql.com/doc/refman/5.0/en/replace.html

Upvotes: 1

Anax
Anax

Reputation: 9382

Create a table without indexing the words and put in all the words from the book using mass inserts (you could also use LOAD DATA). When you're done with insertions, add a new Index on the word field

Then create a second table using:

CREATE TABLE newTable SELECT DISTINCT word FROM oldTable

Upvotes: 3

code_burgar
code_burgar

Reputation: 12323

Instead of removing duplicates, you could make sure that no duplicates ever make it into the table.

Presuming your table has only 2 fields, id and word:

INSERT INTO table SELECT null, 'word' FROM table WHERE NOT EXISTS (SELECT * FROM table WHERE word = 'word') LIMIT 1;

This will insert the word into the table only if it's not already in there

Upvotes: 1

NDM
NDM

Reputation: 6840

select distinct on word field, and then delete all rows that have a different id? I'm not a master in subqueries so no example atm :)

Upvotes: 0

Related Questions