Reputation: 77
I know this sounds like a duplicate question but I've read a bunch of others that seem related and still can't quite figure out what my statement should be. Hopefully my question isn't closed prematurely as a duplicate.
I have a table calls
with a number of columns but the ones I want to use 3 columns to find exact matches: call_start_date
, call_from
and call_to
and then use the column named call_duration
to use as the deciding factor of whether or not to keep the row. Only the higher time in the call_duration
should be kept.
This query works to find the duplicates but how do I modify this query to delete the lower-numbered call_duration
results?
select t.*
from calls t join
(select call_start_date, call_from, call_to, count(*) as NumDuplicates
from calls
group by call_start_date, call_from, call_to
having NumDuplicates > 1
) tsum
on t.call_start_date = tsum.call_start_date and t.call_from = tsum.call_from and t.call_to = tsum.call_to;
Here's my database structure:
CREATE TABLE `calls` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`call_from` varchar(15) COLLATE utf8_unicode_ci NOT NULL,
`call_to` varchar(15) COLLATE utf8_unicode_ci NOT NULL,
`call_start_date` datetime NOT NULL,
`call_direction` varchar(100) COLLATE utf8_unicode_ci NOT NULL,
`user` varchar(100) COLLATE utf8_unicode_ci NOT NULL,
`queue_name` varchar(100) COLLATE utf8_unicode_ci DEFAULT '',
`call_duration` time DEFAULT NULL,
`call_result` varchar(100) COLLATE utf8_unicode_ci DEFAULT '',
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=25001 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
This is a screenshot example of a duplicate. I just want to keep the one with the higher number in the call_duration
column:
Upvotes: 2
Views: 466
Reputation: 35323
Consider this for your select. then just change the select *
to delete A
after confirming this gives the records you want to delete.
Think of it this way: you want a set of data with the max call duration Grouped by your 3 fields. Then, join this set on those 3 fields AND the max duration back to the base set to get all the unique calls joined to their max call duration. Any records not matching that max duration would have a null join; and thus those are the ones you want to delete.
SELECT *
FROM CALLS A
LEFT JOIN (SELECT call_from, call_to, Call_start_date, max(Call_Duration) mCD, min(ID) MID
FROM calls
GROUP BY call_from, call_to, Call_start_date) B
on A.Call_from = B.call_from
and A.Call_to = B.call_to
and A.Call_start_date = B.call_Start_date
and A.call_duration = B.MCD
and A.ID = B.MID
WHERE B.mcd is null
So the delete would be: UPDATED WORKING DEMO:
DELETE A
FROM CALLS A
LEFT JOIN (SELECT call_from, call_to, Call_start_date, max(Call_Duration) mCD, min(ID) mid
FROM CALLS Z
GROUP BY call_from, call_to, Call_start_date) B
on A.Call_from = B.call_from
and A.Call_to = B.call_to
and A.Call_start_date = B.call_Start_date
and A.call_duration = B.MCD
and A.ID = B.MID
WHERE B.mcd is null;
Upvotes: 2