Reputation: 1193
I have table like following
id | a_id | b_id | success
--------------------------
1 34 43 1
2 34 84 1
3 34 43 0
4 65 43 1
5 65 84 1
6 93 23 0
7 93 23 0
I want delete duplicates with same a_id and b_id, but I want keep one record. If possible kept record should be with success=1. So in example table third and sixth/seventh record should be deleted. How to do this?
I'm using MySQL 5.1
Upvotes: 1
Views: 114
Reputation: 1636
In MySQL, if you dont want to care about which record is maintained, a single alter table will work.
ALTER IGNORE TABLE tbl_name
ADD UNIQUE INDEX(a_id, b_id)
It ignores the duplicate records and maintain only the unique records.
A useful links : MySQL: ALTER IGNORE TABLE ADD UNIQUE, what will be truncated?
Upvotes: 0
Reputation: 19700
The task is simple:
The Oracle way,
delete from sample_table where id not in(
select id from
(
Select id, success,row_number()
over (partition by a_id,b_id order by success desc) rown
from sample_table
)
where (success = 1 and rown = 1) or rown=1)
The solution in mysql:
Will give you the minimum ids that should not be deleted.:
Select id from (SELECT * FROM report ORDER BY success desc) t
group by t.a_id, t.b
o/p:
ID
1
2
4
5
6
You can delete the other rows.
delete from report where id not in (the above query)
The consolidated DML:
delete from report
where id not in (Select id
from (SELECT * FROM report
ORDER BY success desc) t
group by t.a_id, t.b_id)
Now doing a Select on report:
ID A_ID B_ID SUCCESS
1 34 43 1
2 34 84 1
4 65 43 1
5 65 84 1
6 93 23 0
You can check the documentation of how the group by clause works when no aggregation function is provided:
When using this feature, all rows in each group should have the same values for the columns that are omitted from the GROUP BY part. The server is free to return any value from the group, so the results are indeterminate unless all values are the same.
So just performing an order by 'success
before the group by would allow us to get the first duplicate row with success = 1
.
Upvotes: 4
Reputation: 1123
This should work:
If procedural programming is available to you like e.g. pl/sql it is fairly simple. If you on the other hand is looking for a clean SQL solution it might be possible but not very "nice". Below is an example in pl/sql:
begin
for x in ( select a_id, b_id
from table
having count(*) > 1
group by a_id, b_id )
loop
for y in ( select *
from table
where a_id = x.a_id
and b_id = x.b_id
order by success desc )
loop
delete from table
where a_id = y.a_id
and b_id = y.b_id
and id != x.id;
exit; // Only do the first row
end loop;
end loop;
end;
This is the idea: For each duplicated combination of a_id and b_id select all the instances ordered so that any with success=1 is up first. Delete all of that combination except the first - being the successful one if any.
or perhaps:
declare
l_a_id integer := -1;
l_b_id integer := -1;
begin
for x in ( select *
from table
order by a_id, b_id, success desc )
loop
if x.a_id = l_a_id and x.b_id = l_b_id
then
delete from table where id = x.id;
end if;
l_a_id := x.a_id;
l_b_id := x.b_id;
end loop;
end;
Upvotes: 0
Reputation: 564
How about this:
CREATE TABLE new_table
AS (SELECT * FROM old_table WHERE 1 AND success = 1 GROUP BY a_id,b_id);
DROP TABLE old_table;
RENAME TABLE new_table TO old_table;
This method will create a new table with a temporary name, and copy all the deduped rows which have success = 1 from the old table. The old table is then dropped and the new table is renamed to the name of the old table.
If I understand your question correctly, this is probably the simplest solution. (though I don't know if it's really efficient or not)
Upvotes: 1