BradenA8
BradenA8

Reputation: 119

Find list of dates in a table closest to specific date from different table.

I have a list of unique ID's in one table that has a date column. Example:

     TABLE1
 ID          Date
 0           2018-01-01
 1           2018-01-05
 2           2018-01-15
 3           2018-01-06
 4           2018-01-09
 5           2018-01-12
 6           2018-01-15
 7           2018-01-02
 8           2018-01-04
 9           2018-02-25

Then in another table I have a list of different values that appear multiple times for each ID with various dates.

         TABLE 2
ID       Value       Date
0        18          2017-11-28
0        24          2017-12-29
0        28          2018-01-06
1        455         2018-01-03
1        468         2018-01-16
2        55          2018-01-03
3        100         2017-12-27
3        110         2018-01-04
3        119         2018-01-10
3        128         2018-01-30
4        223         2018-01-01
4        250         2018-01-09
4        258         2018-01-11

etc

I want to find the value in table 2 that is closest to the unique date in table 1. Sometimes table 2 does contain a value that matches the date exactly and I have had no problem in pulling through those values. But I can't work out the code to pull through the value closest to the date requested from table 1.

My desired result based on the examples above would be

ID          Value          Date
0           24             2017-12-29
1           455            2018-01-03
2           55             2018-01-03
3           110            2018-01-04
4           250            2018-01-09

Since I can easily find the ID's with an exact match, one thing I have tried is taking the ID's that don't have an exact date match and placing them with their corresponding values into a temporary table. Then trying to find the values where I need the closest possible match, but it's here that I'm not sure where to begin on the coding of that.

Apologies if I'm missing a basic function or clause for this, I'm still learning!

Upvotes: 0

Views: 306

Answers (2)

Bruno Paulino
Bruno Paulino

Reputation: 56

dude.

I'll say a couple of things here for you to consider, since SQL Server is not my comfort zone, while SQL itself is.


First of all, I'd join TABLE1 with TABLE2 per ID. That way, I can specify on my SELECT clause the following tuple:

SELECT ID, Value, DateDiff(d, T1.Date, T2.Date) qt_diff_days

Obviously, depending on the precision of the dates kept there, rather they have times or not, you can change the date field on DateDiff function.

Going forward, I'd also make this date difference an absolute number (to resolve positive / negative differences and consider only the elapsed time).

After that, and that's where it gets tricky because I don't know the SQL Server version you're using, but basically I'd use a ROW_NUMBER window function to rank all my lines per difference. Something like the following:

SELECT
  ID, Value, Abs(DateDiff(d, T1.Date, T2.Date)) qt_diff_days,
  ROW_NUMBER() OVER(PARTITION BY ID ORDER BY Abs(DateDiff(d, T1.Date, T2.Date)) ASC) nu_row

ROW_NUMBER (Transact-SQL)

Numbers the output of a result set. More specifically, returns the sequential number of a row within a partition of a result set, starting at 1 for the first row in each partition.

If you could run ROW_NUMBER properly, you should notice the query will rank it's data per ID, starting with 1 and increasing this ranking by it's difference between both dates, reseting it's rank to 1 when ID changes.


After that, all you need to do is select only those lines where nu_row equals to 1. I'd use a CTE to that.

WITH common_table_expression (Transact-SQL)

Specifies a temporary named result set, known as a common table expression (CTE).

Upvotes: 1

Thom A
Thom A

Reputation: 95571

The below would be one method:

WITH Table1 AS(
    SELECT ID, CONVERT(date, datecolumn) DateColumn
    FROM (VALUES (0,'20180101'),
                 (1,'20180105'),
                 (2,'20180115'),
                 (3,'20180106'),
                 (4,'20180109'),
                 (5,'20180112'),
                 (6,'20180115'),
                 (7,'20180102'),
                 (8,'20180104'),
                 (9,'20180225')) V(ID, DateColumn)),
Table2 AS(
    SELECT ID, [value], CONVERT(date, datecolumn) DateColumn
    FROM (VALUES (0,18 ,'2017-11-28'),
                 (0,24 ,'2017-12-29'),
                 (0,28 ,'2018-01-06'),
                 (1,455,'2018-01-03'),
                 (1,468,'2018-01-16'),
                 (2,55 ,'2018-01-03'),
                 (3,100,'2017-12-27'),
                 (3,110,'2018-01-04'),
                 (3,119,'2018-01-10'),
                 (3,128,'2018-01-30'),
                 (4,223,'2018-01-01'),
                 (4,250,'2018-01-09'),
                 (4,258,'2018-01-11')) V(ID, [Value],DateColumn))
SELECT T1.ID,
       T2.[Value],
       T2.DateColumn
FROM Table1 T1
     CROSS APPLY (SELECT TOP 1 *
                  FROM Table2 ca
                  WHERE T1.ID = ca.ID
                  ORDER BY ABS(DATEDIFF(DAY, ca.DateColumn, T1.DateColumn))) T2;

Note that if the difference is days is the same, the row returned will be random (and could differ each time the query is run). For example, if Table had the date 20180804 and Table2 had the dates 20180803 and 20180805 they would both have the value 1 for ABS(DATEDIFF(DAY, ca.DateColumn, T1.DateColumn)). You therefore might need to include additional logic in your ORDER BY to ensure consistent results.

Upvotes: 5

Related Questions