PythonDeveloper
PythonDeveloper

Reputation: 404

Join on closest date in SQL

Hi I have two tables which I have to join on ID field and then on closest date ( only if the date is within one hour range more or less), the reason is both tables are not updated at same time so time is little off between the two ( but max delay is 1 hr). Please see below example:

Table 1:

ID Start_Date         End_Date
1  1/14/2021 14:34    1/27/2021 10:31
1  2/4/2021 10:40     7/2/2021 13:01
2  8/2/2020 00:04      9/7/2020 11:26
2  11/4/2020 9:24     2/8/2021 16:22

Table 2:

ID Start_date         End_Date
1  1/14/2021 13:47    1/27/2021 10:24
2  8/1/2020 23:57      9/7/2020 11:22
2  11/4/2020 9:12     2/8/2021 16:20

In this example: record 1 from table 1 should join to record 1 from table2.

records 2 from table 1 should not join to table 2 ( since there is no record within 1 hr start date for that ID) record 3,4 should join to record 3,4 in table2 respectively.

Upvotes: 2

Views: 2662

Answers (2)

Adrian White
Adrian White

Reputation: 1804

Another option might be:

select *  from  table_1 inner join table_2 on table_1.id = table_2.id 
where      DATEADD(hour,1,table_1.d_1)   between table_2.d_1 and table_2.d_2  
or         DATEADD(hour,-1,table_1.d_2)  between table_2.d_1 and table_2.d_2

enter image description here

Full code you can cut/paste into snowflake and run :-)

with table_1 as ( 
select 1 id,    TO_TIMESTAMP('1/14/2021 14:34',  'mm/dd/yyyy hh24:mi' ) d_1,     TO_TIMESTAMP('1/27/2021 10:31',  'mm/dd/yyyy hh24:mi' )  d_2 
union select 1 id,    TO_TIMESTAMP('2/4/2021 10:40',  'mm/dd/yyyy hh24:mi' )  d_1,     TO_TIMESTAMP('7/2/2021 13:01',  'mm/dd/yyyy hh24:mi' )  d_2 
union select 2 id,    TO_TIMESTAMP('8/2/2020 00:04',  'mm/dd/yyyy hh24:mi' )  d_1,     TO_TIMESTAMP('9/7/2020 11:26',  'mm/dd/yyyy hh24:mi' )  d_2 
union select 2 id,    TO_TIMESTAMP('11/4/2020 9:24',  'mm/dd/yyyy hh24:mi' )  d_1,     TO_TIMESTAMP('2/8/2021 16:22',  'mm/dd/yyyy hh24:mi' )  d_2 ) 
, table_2 as ( 
select 1 id,    TO_TIMESTAMP('1/14/2021 13:47',  'mm/dd/yyyy hh24:mi' ) 
d_1,     TO_TIMESTAMP('1/27/2021 10:24',  'mm/dd/yyyy hh24:mi' )  d_2 
union select 2 id,    TO_TIMESTAMP('8/1/2020 23:57',  'mm/dd/yyyy hh24:mi' )  d_1,     TO_TIMESTAMP('9/7/2020 11:22',  'mm/dd/yyyy hh24:mi' )  d_2 
union select 2 id,    TO_TIMESTAMP('11/4/2020 9:12',  'mm/dd/yyyy hh24:mi' )  d_1,     TO_TIMESTAMP(' 2/8/2021 16:20',  'mm/dd/yyyy hh24:mi' )  d_2   ) 

select *  from  table_1 inner join table_2 on table_1.id = table_2.id 
where      DATEADD(hour,1,table_1.d_1)   between table_2.d_1 and table_2.d_2  
or         DATEADD(hour,-1,table_1.d_2)  between table_2.d_1 and table_2.d_2 

Upvotes: 0

The Impaler
The Impaler

Reputation: 48770

Disclaimer: I'm no expert on Snowflake and I don't have any way of testing the solution described below.

Assuming you can use standard SQL in Snowflake the query below pre-computes the time difference between the starting dates of the possibly related rows, and ranks them from lowest to highest. The lowest (minimal time difference) wins the join.

You can do:

select *
from (
  select
    a.*,
    b.*,
    row_number() over(
      partition by a.id
      order by abs(timediff(second, a.start_date, b.start_date))
    ) as rn
  from table1 a
  join table2 b on a.id = b.id
   and a.start_date between dateadd(hour, -1, b.start_date) 
                        and dateadd(hour, 1, b.start_date)
) x
where rn = 1

Upvotes: 2

Related Questions