user1415173
user1415173

Reputation: 151

delete specific duplicate records sql server 2008

I have duplicate records in my table with below 3 scenario:

record      Adddate
22344222    2016-04-22 00:00:00.000
22344222    2016-05-06 00:00:00.000
22344222    2016-06-06 00:00:00.000
22344222    2016-06-20 00:00:00.000
22344222    2016-07-25 00:00:00.000
22344222    2016-09-26 00:00:00.000
22344222    2016-10-03 00:00:00.000
22344222    2016-10-26 00:00:00.000
22344222    2016-10-27 00:00:00.000
22344222    2016-10-28 00:00:00.000

22344223    2016-04-22 00:00:00.000
22344223    2016-04-22 00:00:00.000
22344223    2016-04-22 00:00:00.000
22344223    2016-04-22 00:00:00.000
22344223    2016-04-22 00:00:00.000
22344223    2016-04-22 00:00:00.000
22344223    2016-04-22 00:00:00.000
22344223    2016-04-22 00:00:00.000
22344223    2016-04-22 00:00:00.000
22344223    2016-04-22 00:00:00.000

22344224    2016-04-22 00:00:00.000
22344224    2016-04-23 00:00:00.000
22344224    2016-04-24 00:00:00.000
22344224    2016-04-25 00:00:00.000
22344224    2016-04-26 00:00:00.000
22344224    2016-06-10 00:00:00.000

I want to delete all the duplicate records except for the 2 rows where the 1st row should be the one having least add date and the 2nd row where the date difference between the adddate is having 45 days.

In the above three scenario I should be able to preserve only the below data

record      Adddate
22344222    2016-04-22 00:00:00.000
22344222    2016-05-06 00:00:00.000

22344223    2016-04-22 00:00:00.000

22344224    2016-04-22 00:00:00.000
22344224    2016-06-06 00:00:00.000

Upvotes: 1

Views: 61

Answers (2)

Wes H
Wes H

Reputation: 4439

I used a table name of Test since it was not provided.

WITH cte AS
(SELECT   *, NumSeq = ROW_NUMBER() OVER (PARTITION BY record ORDER BY Adddate) FROM   dbo.Test)
DELETE  FROM cte
WHERE cte.NumSeq > 1
    AND NOT EXISTS ( SELECT 1
                       FROM cte AS A
                       INNER JOIN cte AS B
                                       ON B.NumSeq > 1
                                          AND   DATEDIFF( DAY, A.Adddate, B.Adddate ) = 45
                                          AND   A.record = B.record
                       WHERE A.NumSeq = 1
                             AND cte.record = B.record
                             AND cte.Adddate = B.Adddate
                   );

SELECT  * FROM  dbo.Test;

This returns 4 rows:

22344222 2016-04-22 00:00:00.000

22344222 2016-06-06 00:00:00.000

22344223 2016-04-22 00:00:00.000

22344224 2016-04-22 00:00:00.000

Edit: Note the desired results shows 5/6 as the 2nd row for 22344222, but that is not 45 days after 4/22. My result returns the row for 6/6. Also, if I add the 6/6 date for 22344224 to the source data, my result will return 5 rows instead.

Upvotes: 0

Charles Bretana
Charles Bretana

Reputation: 146499

try this:

With mad(record, minDat) as 
   (Select record, min(addDate)
    From myTable
    group by record)
Delete t
from mytable t join mad m 
   on m.record = t.Record
where t.adddate not in 
   (m.minDat, dateadd(day, 45, m.minDat))

problem is you have 13 records in the source data for record 22344223 that are all the same.

If you only want one copy of these 13 duplicates, then, after deleting the records,

create table dbo.temp (record integer, addDate date)
Insert dbo.temp(record, addDate)
Select distinct record, addDate
from mytable
-- ------------------------
Drop table myTable
-- ------------------------
exec sp_Rename 'dbo.temp', 'dbo.mytable'   

Upvotes: 5

Related Questions