user3904868
user3904868

Reputation:

SQL query to return closest date match for each Id

I'm trying to write a query (SQL server) which will return a full row for the closest date value of each unique Id.

For example, if there were 20,000 records and 100 unique Ids I want 100 records returned for each unique Id which is closest to the date value.

All data is in a single table

The following I've tried which don't work

SELECT TOP(1) [Id]
      ,[Updated]
      ,[LoadTime]
      ,[Field4]
      ,[Field5]
      ,[Field6]
  FROM [dbo].[tblTempData]
WHERE [Updated] <= [LoadTime]

This only returns 1 record rather than a single record for each Id.

SELECT DISTINCT [Id]
      ,[Updated]
      ,[LoadTime]
      ,[Field4]
      ,[Field5]
      ,[Field6]
  FROM [dbo].[tblTempData]
WHERE [Updated] <= [LoadTime]

This doesn't work as none of the other fields are distinct so I get multiple records with the same Id

Edit - example data and output expected enter image description here

Upvotes: 2

Views: 2105

Answers (4)

Serg
Serg

Reputation: 22811

Sql-server has a nice shortcut for ordering by row_number()

SELECT TOP(1) WITH TIES [Id]
      ,[Updated]
      ,[LoadTime]
      ,[Field4]
      ,[Field5]
      ,[Field6]
FROM [dbo].[tblTempData]
WHERE [Updated] <= [LoadTime]
ORDER BY row_number() over(PARTITION BY [Id] ORDER BY datediff(second,[Updated],[LoadTime] )

Upvotes: 1

S3S
S3S

Reputation: 25112

Depending on what your definition of "closest time" is, this is one way which will return one record for each ID where the LoadTime and Updated is closest to each other, which would essentially be the first record that was updated.

;with cte as(
SELECT [Id]
      ,[Updated]
      ,[LoadTime]
      ,[Field4]
      ,[Field5]
      ,[Field6]
      ,ClosestTime = datediff(second,LoadTime, Updated)
  FROM [dbo].[tblTempData]),

select
    [Id]
    ,[Updated]
    ,[LoadTime]
    ,[Field4]
    ,[Field5]
    ,[Field6]   
from
    cte
inner join
    (select Id, min(ClosestTime) dt
     from cte
     group by Id) cte2 on cte.Id = cte2.Id and cte2.dt = cte.ClosestTime

Another way is using row_number()

;with cte as(
SELECT [Id]
        ,[Updated]
        ,[LoadTime]
        ,[Field4]
        ,[Field5]
        ,[Field6]
        ,RN = row_number() over (partition by Id order by LoadTime, Updated)
FROM [dbo].[tblTempData])

select 
    [Id]
    ,[Updated]
    ,[LoadTime]
    ,[Field4]
    ,[Field5]
    ,[Field6]
from
    cte
where
    RN = 1

Upvotes: 0

Jodrell
Jodrell

Reputation: 35716

how about,

SELECT
   [Id]
  ,[Updated]
  ,[LoadTime]
  ,[Field4]
  ,[Field5]
  ,[Field6]
FROM
(
    SELECT
       RANK() OVER (PARTITION BY [Id] ORDER BY DATEDIFF(ms, [Update], [LoadTime] ASC) [R]
      ,[Id]
      ,[Updated]
      ,[LoadTime]
      ,[Field4]
      ,[Field5]
      ,[Field6]
  FROM
       [dbo].[tblTempData]
) O
WHERE
    O.[R] = 1;

Note, rows that were delayed the same number of milliseconds will both be included. If you want to randomly discard one use ROW_NUMBER() instead.

Upvotes: 0

tsdogs
tsdogs

Reputation: 36

Probably the best way is to first get the record with the closest LoadTime you are looking for and then query thoose records (if you though have 2 records with the same LoadTime and Id this would generate 2 records for that Id)

SELECT * FROM [dbo].[tblTempData] a
INNER JOIN (
   SELECT Id,Max(LoadTime) FROM [dbo].[tblTempData]
     WHERE [Updated]<=[LoadTime]
     GROUP BY Id,Max(LoadTime)) b 
   ON (a.Id=b.Id AND a.LoadTime=b.LoadTime)

should do the work.

Upvotes: 0

Related Questions