Reputation: 119
I have a list of unique ID's in one table that has a date column. Example:
TABLE1
ID Date
0 2018-01-01
1 2018-01-05
2 2018-01-15
3 2018-01-06
4 2018-01-09
5 2018-01-12
6 2018-01-15
7 2018-01-02
8 2018-01-04
9 2018-02-25
Then in another table I have a list of different values that appear multiple times for each ID with various dates.
TABLE 2
ID Value Date
0 18 2017-11-28
0 24 2017-12-29
0 28 2018-01-06
1 455 2018-01-03
1 468 2018-01-16
2 55 2018-01-03
3 100 2017-12-27
3 110 2018-01-04
3 119 2018-01-10
3 128 2018-01-30
4 223 2018-01-01
4 250 2018-01-09
4 258 2018-01-11
etc
I want to find the value in table 2 that is closest to the unique date in table 1. Sometimes table 2 does contain a value that matches the date exactly and I have had no problem in pulling through those values. But I can't work out the code to pull through the value closest to the date requested from table 1.
My desired result based on the examples above would be
ID Value Date
0 24 2017-12-29
1 455 2018-01-03
2 55 2018-01-03
3 110 2018-01-04
4 250 2018-01-09
Since I can easily find the ID's with an exact match, one thing I have tried is taking the ID's that don't have an exact date match and placing them with their corresponding values into a temporary table. Then trying to find the values where I need the closest possible match, but it's here that I'm not sure where to begin on the coding of that.
Apologies if I'm missing a basic function or clause for this, I'm still learning!
Upvotes: 0
Views: 306
Reputation: 56
dude.
I'll say a couple of things here for you to consider, since SQL Server is not my comfort zone, while SQL itself is.
First of all, I'd join TABLE1 with TABLE2 per ID. That way, I can specify on my SELECT
clause the following tuple:
SELECT ID, Value, DateDiff(d, T1.Date, T2.Date) qt_diff_days
Obviously, depending on the precision of the dates kept there, rather they have times or not, you can change the date field on DateDiff
function.
Going forward, I'd also make this date difference an absolute number (to resolve positive / negative differences and consider only the elapsed time).
After that, and that's where it gets tricky because I don't know the SQL Server version you're using, but basically I'd use a ROW_NUMBER
window function to rank all my lines per difference. Something like the following:
SELECT
ID, Value, Abs(DateDiff(d, T1.Date, T2.Date)) qt_diff_days,
ROW_NUMBER() OVER(PARTITION BY ID ORDER BY Abs(DateDiff(d, T1.Date, T2.Date)) ASC) nu_row
Numbers the output of a result set. More specifically, returns the sequential number of a row within a partition of a result set, starting at 1 for the first row in each partition.
If you could run ROW_NUMBER
properly, you should notice the query will rank it's data per ID, starting with 1
and increasing this ranking by it's difference between both dates, reseting it's rank to 1
when ID changes.
After that, all you need to do is select only those lines where nu_row
equals to 1
. I'd use a CTE to that.
WITH common_table_expression (Transact-SQL)
Specifies a temporary named result set, known as a common table expression (CTE).
Upvotes: 1
Reputation: 95571
The below would be one method:
WITH Table1 AS(
SELECT ID, CONVERT(date, datecolumn) DateColumn
FROM (VALUES (0,'20180101'),
(1,'20180105'),
(2,'20180115'),
(3,'20180106'),
(4,'20180109'),
(5,'20180112'),
(6,'20180115'),
(7,'20180102'),
(8,'20180104'),
(9,'20180225')) V(ID, DateColumn)),
Table2 AS(
SELECT ID, [value], CONVERT(date, datecolumn) DateColumn
FROM (VALUES (0,18 ,'2017-11-28'),
(0,24 ,'2017-12-29'),
(0,28 ,'2018-01-06'),
(1,455,'2018-01-03'),
(1,468,'2018-01-16'),
(2,55 ,'2018-01-03'),
(3,100,'2017-12-27'),
(3,110,'2018-01-04'),
(3,119,'2018-01-10'),
(3,128,'2018-01-30'),
(4,223,'2018-01-01'),
(4,250,'2018-01-09'),
(4,258,'2018-01-11')) V(ID, [Value],DateColumn))
SELECT T1.ID,
T2.[Value],
T2.DateColumn
FROM Table1 T1
CROSS APPLY (SELECT TOP 1 *
FROM Table2 ca
WHERE T1.ID = ca.ID
ORDER BY ABS(DATEDIFF(DAY, ca.DateColumn, T1.DateColumn))) T2;
Note that if the difference is days is the same, the row returned will be random (and could differ each time the query is run). For example, if Table
had the date 20180804
and Table2
had the dates 20180803
and 20180805
they would both have the value 1
for ABS(DATEDIFF(DAY, ca.DateColumn, T1.DateColumn))
. You therefore might need to include additional logic in your ORDER BY
to ensure consistent results.
Upvotes: 5