Ratha
Ratha

Reputation: 9692

Remove duplicate records based on timestamp

I'm writing a query to find duplicate records. I have table with following columns

Id, Deliveries, TankId, Timestamp.

I have inserted duplicate records, that is for same tankid, same deliveries with the +1 day offset timestamp.

Now I want to remove duplicate records which is with lesser timestamp.

e.g. I have duplicate deliveries added for same tankid on 24th and 25th july. I need to remove 24th record.

I tried the following query;

SELECT raw.TimeStamp,raw.[Delivery],raw.[TankId]
FROM  [dbo].[tObservationData] raw
INNER JOIN (
    SELECT [Delivery],[TankSystemId]
    FROM [dbo].[ObservationData] 
    GROUP BY [Delivery],[TankSystemId]
    HAVING COUNT([ObservationDataId]) > 1
    ) dup 
    ON raw.[Delivery] = dup.[Delivery] AND raw.[TankId] = dup.[TankId]
    AND  raw.TimeStamp >'2019-06-30 00:00:00.0000000' AND raw.[DeliveryL]>0
ORDER BY [TankSystemId],TimeStamp

But above gives other records too, how can I find and delete those duplicate records?

Upvotes: 1

Views: 595

Answers (3)

Ritika
Ritika

Reputation: 104

In this case you can use partition by order by clause. You can partition by TankID and Delivery and order by Timestamp in desc order

Select * from (
Select *,ROW_NUMBER() OVER (PARTITION BY TankID,Delievry ORDER BY [Timestamp] DESC) AS rn
from [dbo].[ObservationData]
) 
where rn = 1

In the above code records with rn=1 will have the latest timestamp. So you can only select those and ignore others. Also you can use the same to remove/delete the records from you table.

WITH TempObservationdata (TankID,Delivery,Timestamp)
AS
(
SELECT TankID,Delivery,ROW_NUMBER() OVER(PARTITION by TankID, Delivery ORDER BY Timsetamp desc) 
AS Timestamp
FROM dbo.ObservationData
)
--Now Delete Duplicate Rows
DELETE FROM TempObservationdata 
WHERE Timestamp > 1

Upvotes: 1

Y.K.
Y.K.

Reputation: 692

think it will work

SELECT raw.TimeStamp,raw.[Delivery],raw.[TankId]
FROM  [dbo].[tObservationData] raw
INNER JOIN (
    SELECT [Delivery],[TankSystemId],min([TimeStamp]) as min_ts
    FROM [dbo].[ObservationData] 
    GROUP BY [Delivery],[TankSystemId]
    HAVING COUNT([ObservationDataId]) > 1
    ) dup 
    ON raw.[Delivery] = dup.[Delivery] AND raw.[TankId] = dup.[TankId] and raw.[TimeStamp] = dup.min_ts
    AND  raw.TimeStamp >'2019-06-30 00:00:00.0000000' AND raw.[DeliveryL]>0
ORDER BY [TankSystemId],TimeStamp

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1269823

Are you just looking for this?

SELECT od.*
FROM (SELECT od.*,
             ROW_NUMBER() OVER (PARTITION BY od.TankId, od.Delivery ORDER BY od.TimeStamp DESC) as seqnum
      FROM [dbo].[tObservationData] od
     ) od
WHERE seqnum = 1;

Upvotes: 1

Related Questions