Kuris
Kuris

Reputation: 13

SQL Server - Delete older datetime entries if date appears more than once

I have a table that will sometimes have duplicate entries for the same day but different contain a different insert_time based on when the data was ingested into the database. I am looking for a way to remove earlier entries that contain the same date but an earlier time if a record is inserted more than once on that day.

What I have:

    ID  insert_time
1   2020-04-26 04:23:04.923
2   2020-04-26 04:23:04.923
3   2020-04-26 04:23:04.923
1   2020-04-28 12:13:04.923
2   2020-04-28 12:13:04.923
3   2020-04-28 12:13:04.923
1   2020-04-28 13:13:04.923
2   2020-04-28 13:13:04.923
1   2020-04-29 10:16:03.876
2   2020-04-29 10:16:03.876
3   2020-04-29 10:16:03.876

What I want:

ID  insert_time
1   2020-04-26 04:23:04.923
2   2020-04-26 04:23:04.923
3   2020-04-26 04:23:04.923
3   2020-04-28 12:13:04.923
1   2020-04-28 13:13:04.923
2   2020-04-28 13:13:04.923
1   2020-04-29 10:16:03.876
2   2020-04-29 10:16:03.876
3   2020-04-29 10:16:03.876

The current query I'm working on is:

SELECT * FROM my_table
WHERE insert_time
   IN (SELECT MIN(insert_time)
       FROM my_table GROUP BY [id])

but it only returns records with the earliest insert_time in the entire table. I'm stuck trying to figure out how to return rows that have more than one insert_time of the same date and then selecting the earlier of them.

Upvotes: 0

Views: 874

Answers (3)

Gordon Linoff
Gordon Linoff

Reputation: 1269823

Your method would work if you treated it as a correlated clause:

SELECT t.*
FROM my_table t
WHERE t.insert_time = (SELECT MIN(t2.insert_time)
                       FROM my_table t2
                       WHERE t2.id = t.id AND
                             CONVERT(DATE, t2.insert_time) = CONVERT(DATE, t.insert_time)
                      );

Upvotes: 0

anon
anon

Reputation:

I prefer to use window functions for this simply because they scale linearly. With self-joins you get exponential effects.

;WITH cte AS 
(
  SELECT ID, insert_time, rn = ROW_NUMBER() OVER 
    (PARTITION BY ID, CONVERT(date, insert_time) 
     ORDER BY insert_time DESC)
  FROM dbo.source_table
)
SELECT ID, insert_time 
  FROM cte WHERE rn = 1
  ORDER BY insert_time, ID;

Upvotes: 1

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521279

You could use exists logic here:

DELETE
FROM my_table t1
WHERE EXISTS (SELECT 1 FROM my_table t2
              WHERE t2.ID = t1.ID AND
                    CONVERT(date, t2.insert_time) =
                    CONVERT(date, t1.insert_time) AND
                    t2.insert_time > t1.insert_time);

Read in plain English, the above query says to delete a record for which we can find another record with the same ID, same date, but a later timestamp.

Upvotes: 0

Related Questions