stevecross
stevecross

Reputation: 5684

How to remove 'duplicate' records?

I have a table that contains some corrupted records, because I forgot to add an UNIQUE index for two columns. Take a look at the following table for an example:

+----+-------------+--------+------------+
| id | uuid        | object | project_id |
+----+-------------+--------+------------+
| 1  | 73621000001 | screw  | 1          |
| 2  | 73621000002 | screw  | 1          |
| 3  | 73621000003 | screw  | 1          |
| 4  | 73621000004 | tube   | 1          |
| 5  | 73621000005 | plate  | 2          |
| 6  | 73621000006 | plate  | 2          |
| 7  | 73621000007 | plate  | 2          |
| 8  | 73621000008 | plate  | 2          |
| 9  | 73621000009 | plate  | 2          |
| 10 | 73621000010 | gear   | 4          |
| 11 | 73621000011 | gear   | 4          |
+----+-------------+--------+------------+

As you can see, there are some object-project_id-combinations that occur multiple times, but have different uuids. I want to delete all duplicate records, but keep those with the highest uuid. The resulting table should be this:

+----+-------------+--------+------------+
| id | uuid        | object | project_id |
+----+-------------+--------+------------+
| 3  | 73621000003 | screw  | 1          |
| 4  | 73621000004 | tube   | 1          |
| 9  | 73621000009 | plate  | 2          |
| 11 | 73621000011 | gear   | 4          |
+----+-------------+--------+------------+

I can see which objects have duplicates using the following query:

SELECT uuid, object, project_id, COUNT(*)
FROM uuid_object_mapping
GROUP BY object, project_id
HAVING COUNT(*) > 1;

I can get the 'clean' table using this query:

SELECT MAX(uuid) as uuid, object, project_id
FROM uuid_object_mapping
GROUP BY object, project_id;

And I can verify that the 'clean' table does not contain duplicates using

SELECT uuid, object, project_id, COUNT(*)
FROM (
    SELECT MAX(uuid) as uuid, object_name, project_id
    FROM uuid_object_mapping
    GROUP BY object_name, project_id
) AS clean
GROUP BY object_name, project_id
HAVING COUNT(*) > 1;

But how can I delete everything that is not in the 'clean' table?

Upvotes: 0

Views: 90

Answers (3)

manish kumar
manish kumar

Reputation: 26

Please use partition by along with order by uuid clause. Search for partion by. It is the best technique to remove duplicate.

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1270993

In MySQL, you can use a join, but you need to be careful about the NULL values:

delete om
    from uuid_object_mapping om join
         (select MAX(uuid) as uuid, object, project_id
          from uuid_object_mapping 
          group by object, project_id
         ) omkeep
         on omkeep.object = om.object and
            omkeep.project_id <=> om.project_id
    where om.uuid <> omkeep.uuid;

The NULL values seem to have disappeared, so you can use this on clause:

         on omkeep.object = om.object and
            omkeep.project_id = om.project_id

Upvotes: 2

Madhivanan
Madhivanan

Reputation: 13700

Try this

select t1.id,t1.uuid,t1.object,t1.project_id from table as t1 inner join
(
select object,max(id) as id from table
group by object
) as t2 on t1.object=t2.object and t1.id=t2.id

Upvotes: -1

Related Questions