user1214633
user1214633

Reputation: 667

Removing duplicate data from many rows in mysql?

I am a web developer so my knowledge of manipulating mass data is lacking.

A coworker is looking for a solution to our data problems. We have a table of about 400k rows with company names listed.

Whoever designed this didnt realize there needed to be some kind of unique identifier for a company, so there are duplicate entries for company names.

What method would one use in order to match all these records up based on company name, and delete the duplicates based on some kind of criteria (another column)

I was thinking of writing a script to do this in php, but I really have a hard time believing that my script would be able to execute while making comparisons between so many rows. Any advice?

Upvotes: 1

Views: 298

Answers (3)

Andrey Gurinov
Andrey Gurinov

Reputation: 2895

To find list of companies with duplicates in your table you can use script like that:

SELECT NAME
FROM companies
GROUP BY NAME
HAVING COUNT(*) > 1

And following will delete all duplicates except containing max values in col column

DELETE del
FROM companies AS del
INNER JOIN (
    SELECT NAME, MAX(col) AS col
    FROM companies
    GROUP BY NAME
    HAVING COUNT(*) > 1
) AS sub
    ON del.NAME = sub.NAME AND del.col <> sub.col

Upvotes: 0

Pedro Ferreira
Pedro Ferreira

Reputation: 649

Answer: Answer origin

1) delete from table1

2) USING table1, table1 as vtable

3) WHERE (NOT table1.ID>vtable.ID)

4) AND (table1.field_name=vtable.field_name)

  1. Here you tell mysql that there is a table1.
  2. Then you tell it that you will use table1 and a virtual table with the values of table1.
  3. This will let mysql not compare a record with itself!
  4. Here you tell it that there shouldn’t be records with the same field_name.

Upvotes: 0

DNadel
DNadel

Reputation: 485

The way I've done this in the past is to write a query that returns only the set I want (usually using DISTINCT + a subquery to determine the right record based on other values), and insert that into a different table. You can then delete the old table and rename the new one to the old name.

Upvotes: 0

Related Questions