andrzej1_1
andrzej1_1

Reputation: 1193

Delete duplicates from db

I have table like following

id | a_id | b_id | success
--------------------------
1    34      43      1
2    34      84      1
3    34      43      0
4    65      43      1
5    65      84      1
6    93      23      0
7    93      23      0

I want delete duplicates with same a_id and b_id, but I want keep one record. If possible kept record should be with success=1. So in example table third and sixth/seventh record should be deleted. How to do this?

I'm using MySQL 5.1

Upvotes: 1

Views: 114

Answers (4)

georgecj11
georgecj11

Reputation: 1636

In MySQL, if you dont want to care about which record is maintained, a single alter table will work.

ALTER IGNORE TABLE tbl_name
ADD UNIQUE INDEX(a_id, b_id)

It ignores the duplicate records and maintain only the unique records.

A useful links : MySQL: ALTER IGNORE TABLE ADD UNIQUE, what will be truncated?

Upvotes: 0

BatScream
BatScream

Reputation: 19700

The task is simple:

  • Find the minimum number of records that should not be deleted.
  • Delete the other records.

The Oracle way,

delete from sample_table where id not in(
select id from
(
 Select id, success,row_number()  
 over (partition by a_id,b_id order by success desc) rown 
 from sample_table
) 
where (success = 1 and rown = 1) or rown=1)

The solution in mysql:

Will give you the minimum ids that should not be deleted.:

Select id from (SELECT * FROM report ORDER BY success desc) t 
group by  t.a_id, t.b

o/p:

ID
1
2
4
5
6

You can delete the other rows.

  delete from report where id not in (the above query)

The consolidated DML:

delete from report 
  where id not in (Select id 
                    from (SELECT * FROM report 
                          ORDER BY success desc) t 
                    group by  t.a_id, t.b_id)

Now doing a Select on report:

ID  A_ID    B_ID    SUCCESS
1   34       43     1
2   34       84     1
4   65       43     1
5   65       84     1
6   93       23     0

You can check the documentation of how the group by clause works when no aggregation function is provided:

When using this feature, all rows in each group should have the same values for the columns that are omitted from the GROUP BY part. The server is free to return any value from the group, so the results are indeterminate unless all values are the same.

So just performing an order by 'success before the group by would allow us to get the first duplicate row with success = 1.

Upvotes: 4

Jens Krogsboell
Jens Krogsboell

Reputation: 1123

This should work:

If procedural programming is available to you like e.g. pl/sql it is fairly simple. If you on the other hand is looking for a clean SQL solution it might be possible but not very "nice". Below is an example in pl/sql:

begin
  for x in ( select a_id, b_id
             from   table
             having count(*) > 1
             group  by a_id, b_id )
  loop
    for y in ( select *
               from   table
               where  a_id = x.a_id
               and    b_id = x.b_id
               order  by success desc )
    loop
      delete from table
      where  a_id = y.a_id
      and    b_id = y.b_id
      and    id != x.id;
      exit; // Only do the first row
    end loop;
  end loop;
end;

This is the idea: For each duplicated combination of a_id and b_id select all the instances ordered so that any with success=1 is up first. Delete all of that combination except the first - being the successful one if any.

or perhaps:

declare
  l_a_id integer := -1;
  l_b_id integer := -1;
begin
  for x in ( select *
             from   table
             order  by a_id, b_id, success desc )
  loop
    if  x.a_id = l_a_id and x.b_id = l_b_id
    then
      delete from table where  id = x.id;
    end if;
    l_a_id := x.a_id;
    l_b_id := x.b_id;
  end loop;
end;

Upvotes: 0

CRThaze
CRThaze

Reputation: 564

How about this:

CREATE TABLE new_table 
  AS (SELECT * FROM old_table WHERE 1 AND success = 1 GROUP BY a_id,b_id);
DROP TABLE old_table;
RENAME TABLE new_table TO old_table;

This method will create a new table with a temporary name, and copy all the deduped rows which have success = 1 from the old table. The old table is then dropped and the new table is renamed to the name of the old table.

If I understand your question correctly, this is probably the simplest solution. (though I don't know if it's really efficient or not)

Upvotes: 1

Related Questions