Reputation: 44709
How to clean a database from identical by a specific field rows so that only one copy of each set of identical rows is left there?
I have a table:
CREATE TABLE table1 (field1 varchar(255), field2 varchar(255));
I'd like to purge table1
of any unnecessary copies of rows, leaving a random row from each set of rows with the same field1
.
UPD: Please post MySQL-compatible commands.
Upvotes: 3
Views: 372
Reputation: 146053
So you could make a new table without dups. I imagine you thought of this already.
CREATE TABLE new_test (field1 INTEGER, field2 INTEGER);
INSERT INTO new_test(field1,field2) SELECT DISTINCT field1,field2 FROM test;
DROP TABLE test;
RENAME TABLE new_test test;
If you had a unique key, you could do a self join and identify the targets by having a unique key >
than the minimum. If you didn't have such a key, you could make one:
ALTER TABLE t2 ADD COLUMN (pk INTEGER NOT NULL AUTO_INCREMENT, PRIMARY KEY(pk));
Anyway, now you can do a self join and keep MIN(pk):
mysql> DELETE dups.* FROM t2 AS dups
INNER JOIN (
SELECT field1,field2,MIN(pk) as MPK FROM t2
GROUP BY field1,field2 HAVING COUNT(*) > 1 ) AS keep
ON keep.field1=dups.field1
AND keep.field2=dups.field2
AND keep.MPK <> dups.pk;
Upvotes: 0
Reputation: 238048
You can use MYSQL's ALTER IGNORE syntax for that. The following command will remove any duplicates, and leave a random row:
alter ignore table table1 add unique index index1 (field1);
It would be wise to keep the index in place, so new duplicates cannot be added. But if you'd like, you can remove the index with:
alter table table1 drop index index1;
Upvotes: 0
Reputation: 171351
The simplest way is to make use of the MySQL-specific ALTER IGNORE command. It is unintuitive to delete rows by creating an index, but works very well. The IGNORE keyword means that when you create an index, any duplicate rows will be deleted. And, leaving the index in place that we create below will prevent any future duplicates. If you do not wish this behaviour, just drop the index after creating it.
ALTER IGNORE TABLE table1 ADD UNIQUE INDEX indexname (field1, field2)
Upvotes: 4
Reputation: 35497
In MySQL:
CREATE TABLE `new_table` LIKE `table1`;
INSERT INTO `new_table` ( SELECT * FROM `table1` GROUP BY field1 );
DROP TABLE `table1`;
RENAME TABLE `new_table` TO `table1`;
This won't exactly choose a "random" duplicate row, but it may accomplish what you desire if you don't care about that.
If you have more fields that need to be unique in combination with the rest, add them to the GROUP BY clause.
EDIT: Reverted to old answer
Upvotes: 1
Reputation: 97671
This should do it (untested, in SQL Server):
SELECT field1, field2
INTO #temp
FROM
(SELECT ROW_NUMBER() OVER (PARTITION BY field1 ORDER BY NEWID()) AS __ROW, *
FROM table1) x
WHERE x.__ROW = 1;
DELETE table1;
INSERT table1
SELECT field1, field2
FROM #temp;
Upvotes: 0
Reputation: 332521
Working off Fragsworth's answer, I'd:
field1
as the primary keyThe primary key would stop rows with the same field1 value from being inserted, and be better overall for later queries.
Upvotes: 1