Alex
Alex

Reputation: 44709

Cleaning up identical rows with SQL

How to clean a database from identical by a specific field rows so that only one copy of each set of identical rows is left there?

I have a table:

    CREATE TABLE table1 (field1 varchar(255), field2 varchar(255));

I'd like to purge table1 of any unnecessary copies of rows, leaving a random row from each set of rows with the same field1.

UPD: Please post MySQL-compatible commands.

Upvotes: 3

Views: 372

Answers (6)

DigitalRoss
DigitalRoss

Reputation: 146053

Make a new table OR add a unique key, self join, and delete all but the minimum key


New table:

So you could make a new table without dups. I imagine you thought of this already.

 CREATE TABLE new_test (field1 INTEGER, field2 INTEGER);
    INSERT INTO new_test(field1,field2) SELECT DISTINCT field1,field2 FROM test;
    DROP TABLE test;
    RENAME TABLE new_test test;

If you had a unique key, you could do a self join and identify the targets by having a unique key > than the minimum. If you didn't have such a key, you could make one:

Make unique key:

ALTER TABLE t2 ADD COLUMN (pk INTEGER NOT NULL AUTO_INCREMENT, PRIMARY KEY(pk));

Anyway, now you can do a self join and keep MIN(pk):

Self-join and delete dups:

mysql> DELETE dups.* FROM t2 AS dups
           INNER JOIN (
               SELECT field1,field2,MIN(pk) as MPK FROM t2
               GROUP BY field1,field2 HAVING COUNT(*) > 1 ) AS keep
           ON keep.field1=dups.field1
              AND keep.field2=dups.field2
              AND keep.MPK <> dups.pk;

Upvotes: 0

Andomar
Andomar

Reputation: 238048

You can use MYSQL's ALTER IGNORE syntax for that. The following command will remove any duplicates, and leave a random row:

alter ignore table table1 add unique index index1 (field1);

It would be wise to keep the index in place, so new duplicates cannot be added. But if you'd like, you can remove the index with:

alter table table1 drop index index1;

Upvotes: 0

D&#39;Arcy Rittich
D&#39;Arcy Rittich

Reputation: 171351

The simplest way is to make use of the MySQL-specific ALTER IGNORE command. It is unintuitive to delete rows by creating an index, but works very well. The IGNORE keyword means that when you create an index, any duplicate rows will be deleted. And, leaving the index in place that we create below will prevent any future duplicates. If you do not wish this behaviour, just drop the index after creating it.

ALTER IGNORE TABLE table1 ADD UNIQUE INDEX indexname (field1, field2)

Upvotes: 4

Fragsworth
Fragsworth

Reputation: 35497

In MySQL:

CREATE TABLE `new_table` LIKE `table1`;
INSERT INTO `new_table` ( SELECT * FROM `table1` GROUP BY field1 );
DROP TABLE `table1`;
RENAME TABLE `new_table` TO `table1`;

This won't exactly choose a "random" duplicate row, but it may accomplish what you desire if you don't care about that.

If you have more fields that need to be unique in combination with the rest, add them to the GROUP BY clause.

EDIT: Reverted to old answer

Upvotes: 1

Dave Markle
Dave Markle

Reputation: 97671

This should do it (untested, in SQL Server):

SELECT field1, field2
INTO #temp
FROM 
   (SELECT ROW_NUMBER() OVER (PARTITION BY field1 ORDER BY NEWID()) AS __ROW, *
    FROM table1) x
WHERE x.__ROW = 1;

DELETE table1;

INSERT table1 
SELECT field1, field2
FROM #temp;

Upvotes: 0

OMG Ponies
OMG Ponies

Reputation: 332521

Working off Fragsworth's answer, I'd:

  1. Create a new table: NEW_TABLE
  2. Define the field1 as the primary key
  3. Insert rows into NEW_TABLE from the old table
  4. Drop the old table
  5. Rename NEW_TABLE to whatever the old table was called

The primary key would stop rows with the same field1 value from being inserted, and be better overall for later queries.

Upvotes: 1

Related Questions