Reputation: 151
I have duplicate records in my table with below 3 scenario:
record Adddate
22344222 2016-04-22 00:00:00.000
22344222 2016-05-06 00:00:00.000
22344222 2016-06-06 00:00:00.000
22344222 2016-06-20 00:00:00.000
22344222 2016-07-25 00:00:00.000
22344222 2016-09-26 00:00:00.000
22344222 2016-10-03 00:00:00.000
22344222 2016-10-26 00:00:00.000
22344222 2016-10-27 00:00:00.000
22344222 2016-10-28 00:00:00.000
22344223 2016-04-22 00:00:00.000
22344223 2016-04-22 00:00:00.000
22344223 2016-04-22 00:00:00.000
22344223 2016-04-22 00:00:00.000
22344223 2016-04-22 00:00:00.000
22344223 2016-04-22 00:00:00.000
22344223 2016-04-22 00:00:00.000
22344223 2016-04-22 00:00:00.000
22344223 2016-04-22 00:00:00.000
22344223 2016-04-22 00:00:00.000
22344224 2016-04-22 00:00:00.000
22344224 2016-04-23 00:00:00.000
22344224 2016-04-24 00:00:00.000
22344224 2016-04-25 00:00:00.000
22344224 2016-04-26 00:00:00.000
22344224 2016-06-10 00:00:00.000
I want to delete all the duplicate records except for the 2 rows where the 1st row should be the one having least add date and the 2nd row where the date difference between the adddate is having 45 days.
In the above three scenario I should be able to preserve only the below data
record Adddate
22344222 2016-04-22 00:00:00.000
22344222 2016-05-06 00:00:00.000
22344223 2016-04-22 00:00:00.000
22344224 2016-04-22 00:00:00.000
22344224 2016-06-06 00:00:00.000
Upvotes: 1
Views: 61
Reputation: 4439
I used a table name of Test since it was not provided.
WITH cte AS
(SELECT *, NumSeq = ROW_NUMBER() OVER (PARTITION BY record ORDER BY Adddate) FROM dbo.Test)
DELETE FROM cte
WHERE cte.NumSeq > 1
AND NOT EXISTS ( SELECT 1
FROM cte AS A
INNER JOIN cte AS B
ON B.NumSeq > 1
AND DATEDIFF( DAY, A.Adddate, B.Adddate ) = 45
AND A.record = B.record
WHERE A.NumSeq = 1
AND cte.record = B.record
AND cte.Adddate = B.Adddate
);
SELECT * FROM dbo.Test;
This returns 4 rows:
22344222 2016-04-22 00:00:00.000
22344222 2016-06-06 00:00:00.000
22344223 2016-04-22 00:00:00.000
22344224 2016-04-22 00:00:00.000
Edit: Note the desired results shows 5/6 as the 2nd row for 22344222, but that is not 45 days after 4/22. My result returns the row for 6/6. Also, if I add the 6/6 date for 22344224 to the source data, my result will return 5 rows instead.
Upvotes: 0
Reputation: 146499
try this:
With mad(record, minDat) as
(Select record, min(addDate)
From myTable
group by record)
Delete t
from mytable t join mad m
on m.record = t.Record
where t.adddate not in
(m.minDat, dateadd(day, 45, m.minDat))
problem is you have 13 records in the source data for record 22344223
that are all the same.
If you only want one copy of these 13 duplicates, then, after deleting the records,
create table dbo.temp (record integer, addDate date)
Insert dbo.temp(record, addDate)
Select distinct record, addDate
from mytable
-- ------------------------
Drop table myTable
-- ------------------------
exec sp_Rename 'dbo.temp', 'dbo.mytable'
Upvotes: 5