Reputation: 5684
I have a table that contains some corrupted records, because I forgot to
add an UNIQUE
index for two columns.
Take a look at the following table for an example:
+----+-------------+--------+------------+
| id | uuid | object | project_id |
+----+-------------+--------+------------+
| 1 | 73621000001 | screw | 1 |
| 2 | 73621000002 | screw | 1 |
| 3 | 73621000003 | screw | 1 |
| 4 | 73621000004 | tube | 1 |
| 5 | 73621000005 | plate | 2 |
| 6 | 73621000006 | plate | 2 |
| 7 | 73621000007 | plate | 2 |
| 8 | 73621000008 | plate | 2 |
| 9 | 73621000009 | plate | 2 |
| 10 | 73621000010 | gear | 4 |
| 11 | 73621000011 | gear | 4 |
+----+-------------+--------+------------+
As you can see, there are some object
-project_id
-combinations that occur multiple times, but have different uuids
.
I want to delete all duplicate records, but keep those with the highest uuid
.
The resulting table should be this:
+----+-------------+--------+------------+
| id | uuid | object | project_id |
+----+-------------+--------+------------+
| 3 | 73621000003 | screw | 1 |
| 4 | 73621000004 | tube | 1 |
| 9 | 73621000009 | plate | 2 |
| 11 | 73621000011 | gear | 4 |
+----+-------------+--------+------------+
I can see which object
s have duplicates using the following query:
SELECT uuid, object, project_id, COUNT(*)
FROM uuid_object_mapping
GROUP BY object, project_id
HAVING COUNT(*) > 1;
I can get the 'clean' table using this query:
SELECT MAX(uuid) as uuid, object, project_id
FROM uuid_object_mapping
GROUP BY object, project_id;
And I can verify that the 'clean' table does not contain duplicates using
SELECT uuid, object, project_id, COUNT(*)
FROM (
SELECT MAX(uuid) as uuid, object_name, project_id
FROM uuid_object_mapping
GROUP BY object_name, project_id
) AS clean
GROUP BY object_name, project_id
HAVING COUNT(*) > 1;
But how can I delete everything that is not in the 'clean' table?
Upvotes: 0
Views: 90
Reputation: 26
Please use partition by along with order by uuid clause. Search for partion by. It is the best technique to remove duplicate.
Upvotes: 0
Reputation: 1270993
In MySQL, you can use a join
, but you need to be careful about the NULL
values:
delete om
from uuid_object_mapping om join
(select MAX(uuid) as uuid, object, project_id
from uuid_object_mapping
group by object, project_id
) omkeep
on omkeep.object = om.object and
omkeep.project_id <=> om.project_id
where om.uuid <> omkeep.uuid;
The NULL
values seem to have disappeared, so you can use this on
clause:
on omkeep.object = om.object and
omkeep.project_id = om.project_id
Upvotes: 2
Reputation: 13700
Try this
select t1.id,t1.uuid,t1.object,t1.project_id from table as t1 inner join
(
select object,max(id) as id from table
group by object
) as t2 on t1.object=t2.object and t1.id=t2.id
Upvotes: -1