Mike Stecker
Mike Stecker

Reputation: 77

Delete Duplicate MySQL rows but keep one

I know this sounds like a duplicate question but I've read a bunch of others that seem related and still can't quite figure out what my statement should be. Hopefully my question isn't closed prematurely as a duplicate.

I have a table calls with a number of columns but the ones I want to use 3 columns to find exact matches: call_start_date, call_from and call_to and then use the column named call_duration to use as the deciding factor of whether or not to keep the row. Only the higher time in the call_duration should be kept.

This query works to find the duplicates but how do I modify this query to delete the lower-numbered call_duration results?

select t.*
from calls t join
 (select call_start_date, call_from, call_to, count(*) as NumDuplicates
  from calls
  group by call_start_date, call_from, call_to
  having NumDuplicates > 1
 ) tsum
 on t.call_start_date = tsum.call_start_date and t.call_from = tsum.call_from and t.call_to = tsum.call_to;

Here's my database structure:

CREATE TABLE `calls` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`call_from` varchar(15) COLLATE utf8_unicode_ci NOT NULL,
`call_to` varchar(15) COLLATE utf8_unicode_ci NOT NULL,
`call_start_date` datetime NOT NULL,
`call_direction` varchar(100) COLLATE utf8_unicode_ci NOT NULL,
`user` varchar(100) COLLATE utf8_unicode_ci NOT NULL,
`queue_name` varchar(100) COLLATE utf8_unicode_ci DEFAULT '',
`call_duration` time DEFAULT NULL,
`call_result` varchar(100) COLLATE utf8_unicode_ci DEFAULT '',
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=25001 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

This is a screenshot example of a duplicate. I just want to keep the one with the higher number in the call_duration column: enter image description here

Upvotes: 2

Views: 466

Answers (1)

xQbert
xQbert

Reputation: 35323

Consider this for your select. then just change the select * to delete A after confirming this gives the records you want to delete.

Think of it this way: you want a set of data with the max call duration Grouped by your 3 fields. Then, join this set on those 3 fields AND the max duration back to the base set to get all the unique calls joined to their max call duration. Any records not matching that max duration would have a null join; and thus those are the ones you want to delete.

SELECT * 
FROM CALLS A
LEFT JOIN (SELECT call_from, call_to, Call_start_date, max(Call_Duration) mCD, min(ID) MID
            FROM calls
            GROUP BY call_from, call_to, Call_start_date) B
 on A.Call_from = B.call_from
and A.Call_to = B.call_to
and A.Call_start_date = B.call_Start_date
and A.call_duration = B.MCD
and A.ID = B.MID
WHERE B.mcd is null

So the delete would be: UPDATED WORKING DEMO:

DELETE A
FROM CALLS A
LEFT JOIN (SELECT call_from, call_to, Call_start_date, max(Call_Duration) mCD, min(ID) mid
            FROM CALLS Z
            GROUP BY call_from, call_to, Call_start_date) B
 on A.Call_from = B.call_from
and A.Call_to = B.call_to
and A.Call_start_date = B.call_Start_date
and A.call_duration = B.MCD
and A.ID = B.MID
WHERE B.mcd is null;

Upvotes: 2

Related Questions