Reputation: 13
I have a table that will sometimes have duplicate entries for the same day but different contain a different insert_time based on when the data was ingested into the database. I am looking for a way to remove earlier entries that contain the same date but an earlier time if a record is inserted more than once on that day.
What I have:
ID insert_time
1 2020-04-26 04:23:04.923
2 2020-04-26 04:23:04.923
3 2020-04-26 04:23:04.923
1 2020-04-28 12:13:04.923
2 2020-04-28 12:13:04.923
3 2020-04-28 12:13:04.923
1 2020-04-28 13:13:04.923
2 2020-04-28 13:13:04.923
1 2020-04-29 10:16:03.876
2 2020-04-29 10:16:03.876
3 2020-04-29 10:16:03.876
What I want:
ID insert_time
1 2020-04-26 04:23:04.923
2 2020-04-26 04:23:04.923
3 2020-04-26 04:23:04.923
3 2020-04-28 12:13:04.923
1 2020-04-28 13:13:04.923
2 2020-04-28 13:13:04.923
1 2020-04-29 10:16:03.876
2 2020-04-29 10:16:03.876
3 2020-04-29 10:16:03.876
The current query I'm working on is:
SELECT * FROM my_table
WHERE insert_time
IN (SELECT MIN(insert_time)
FROM my_table GROUP BY [id])
but it only returns records with the earliest insert_time in the entire table. I'm stuck trying to figure out how to return rows that have more than one insert_time of the same date and then selecting the earlier of them.
Upvotes: 0
Views: 874
Reputation: 1269823
Your method would work if you treated it as a correlated clause:
SELECT t.*
FROM my_table t
WHERE t.insert_time = (SELECT MIN(t2.insert_time)
FROM my_table t2
WHERE t2.id = t.id AND
CONVERT(DATE, t2.insert_time) = CONVERT(DATE, t.insert_time)
);
Upvotes: 0
Reputation:
I prefer to use window functions for this simply because they scale linearly. With self-joins you get exponential effects.
;WITH cte AS
(
SELECT ID, insert_time, rn = ROW_NUMBER() OVER
(PARTITION BY ID, CONVERT(date, insert_time)
ORDER BY insert_time DESC)
FROM dbo.source_table
)
SELECT ID, insert_time
FROM cte WHERE rn = 1
ORDER BY insert_time, ID;
Upvotes: 1
Reputation: 521279
You could use exists logic here:
DELETE
FROM my_table t1
WHERE EXISTS (SELECT 1 FROM my_table t2
WHERE t2.ID = t1.ID AND
CONVERT(date, t2.insert_time) =
CONVERT(date, t1.insert_time) AND
t2.insert_time > t1.insert_time);
Read in plain English, the above query says to delete a record for which we can find another record with the same ID
, same date, but a later timestamp.
Upvotes: 0