Reputation: 39
+--------------+------------+-------------------+------------+
| id | DATA_ID | SourceID | DateAdded |
+--------------+------------+-------------------+------------+
| 1 | 304019032 | 1 | 2018-01-20|
| 2 | 345583556 | 1 | 2018-01-21|
| 3 | 299951717 | 3 | 2018-01-23|
| 4 | 304019032 | 2 | 2018-01-24|
| 5 | 298519282 | 2 | 2018-01-24|
| 6 | 299951717 | 3 | 2018-01-27|
| 7 | 345583556 | 1 | 2018-01-27|
+--------------+------------+-------------------+------------+
I'm trying to delete duplicate rows that have the same DATA_ID and SourceID, leaving the row with the most recent date.
Upvotes: 1
Views: 97
Reputation: 1269443
In SQL Server, I like to use row_number()
and an updatable CTE for this:
with todelete as (
select t.*, row_number() over (partition by sourceid, dataid order by dateadded desc) as seqnum
from t
)
delete from todelete
where seqnum > 1;
This will work even when sourceid
/dataid
is NULL
(which is probably not the case for you). It also does not assume that dateadded
and id
are mutually increasing (although that might be a reasonable assumption).
Upvotes: 2
Reputation: 204746
delete from your_table
where id not in
(
select max(id)
from your_table
group by data_id, sourceID
)
Upvotes: 0