Reputation: 404
Hi I have two tables which I have to join on ID field and then on closest date ( only if the date is within one hour range more or less), the reason is both tables are not updated at same time so time is little off between the two ( but max delay is 1 hr). Please see below example:
Table 1:
ID Start_Date End_Date
1 1/14/2021 14:34 1/27/2021 10:31
1 2/4/2021 10:40 7/2/2021 13:01
2 8/2/2020 00:04 9/7/2020 11:26
2 11/4/2020 9:24 2/8/2021 16:22
Table 2:
ID Start_date End_Date
1 1/14/2021 13:47 1/27/2021 10:24
2 8/1/2020 23:57 9/7/2020 11:22
2 11/4/2020 9:12 2/8/2021 16:20
In this example: record 1 from table 1 should join to record 1 from table2.
records 2 from table 1 should not join to table 2 ( since there is no record within 1 hr start date for that ID) record 3,4 should join to record 3,4 in table2 respectively.
Upvotes: 2
Views: 2662
Reputation: 1804
Another option might be:
select * from table_1 inner join table_2 on table_1.id = table_2.id
where DATEADD(hour,1,table_1.d_1) between table_2.d_1 and table_2.d_2
or DATEADD(hour,-1,table_1.d_2) between table_2.d_1 and table_2.d_2
Full code you can cut/paste into snowflake and run :-)
with table_1 as (
select 1 id, TO_TIMESTAMP('1/14/2021 14:34', 'mm/dd/yyyy hh24:mi' ) d_1, TO_TIMESTAMP('1/27/2021 10:31', 'mm/dd/yyyy hh24:mi' ) d_2
union select 1 id, TO_TIMESTAMP('2/4/2021 10:40', 'mm/dd/yyyy hh24:mi' ) d_1, TO_TIMESTAMP('7/2/2021 13:01', 'mm/dd/yyyy hh24:mi' ) d_2
union select 2 id, TO_TIMESTAMP('8/2/2020 00:04', 'mm/dd/yyyy hh24:mi' ) d_1, TO_TIMESTAMP('9/7/2020 11:26', 'mm/dd/yyyy hh24:mi' ) d_2
union select 2 id, TO_TIMESTAMP('11/4/2020 9:24', 'mm/dd/yyyy hh24:mi' ) d_1, TO_TIMESTAMP('2/8/2021 16:22', 'mm/dd/yyyy hh24:mi' ) d_2 )
, table_2 as (
select 1 id, TO_TIMESTAMP('1/14/2021 13:47', 'mm/dd/yyyy hh24:mi' )
d_1, TO_TIMESTAMP('1/27/2021 10:24', 'mm/dd/yyyy hh24:mi' ) d_2
union select 2 id, TO_TIMESTAMP('8/1/2020 23:57', 'mm/dd/yyyy hh24:mi' ) d_1, TO_TIMESTAMP('9/7/2020 11:22', 'mm/dd/yyyy hh24:mi' ) d_2
union select 2 id, TO_TIMESTAMP('11/4/2020 9:12', 'mm/dd/yyyy hh24:mi' ) d_1, TO_TIMESTAMP(' 2/8/2021 16:20', 'mm/dd/yyyy hh24:mi' ) d_2 )
select * from table_1 inner join table_2 on table_1.id = table_2.id
where DATEADD(hour,1,table_1.d_1) between table_2.d_1 and table_2.d_2
or DATEADD(hour,-1,table_1.d_2) between table_2.d_1 and table_2.d_2
Upvotes: 0
Reputation: 48770
Disclaimer: I'm no expert on Snowflake and I don't have any way of testing the solution described below.
Assuming you can use standard SQL in Snowflake the query below pre-computes the time difference between the starting dates of the possibly related rows, and ranks them from lowest to highest. The lowest (minimal time difference) wins the join.
You can do:
select *
from (
select
a.*,
b.*,
row_number() over(
partition by a.id
order by abs(timediff(second, a.start_date, b.start_date))
) as rn
from table1 a
join table2 b on a.id = b.id
and a.start_date between dateadd(hour, -1, b.start_date)
and dateadd(hour, 1, b.start_date)
) x
where rn = 1
Upvotes: 2