How to perform SQL join on lag date in Snowflake?

Question

I have a table with charges and user id's and a separate table with plan changes. I need to join these tables in such a way that I know what the user's plan id was at the time of the charge in Snowflake.

Charge Table:

Charge_ID,User_ID,Inserted_At
AAAA     ,1234   ,2022-01-01 15:00:00
AAAA     ,1234   ,2022-02-01 15:00:00
BBBB     ,5678   ,2022-01-05 18:00:00
BBBB     ,5678   ,2022-02-07 18:00:00

Plan Table:

User_ID,Plan_ID,Inserted_At
1234   ,100    ,2022-01-01 13:00:00
1234   ,099    ,2022-01-01 14:00:00
1234   ,101    ,2022-01-18 13:00:00
5678   ,050    ,2022-01-04 13:00:00
5678   ,051    ,2022-02-08 13:00:00

Result:

Charge_ID,User_ID,Charge_Inserted_At  ,Plan_ID
AAAA     ,1234   ,2022-01-01 15:00:00 ,099
AAAA     ,1234   ,2022-02-01 15:00:00 ,101
BBBB     ,5678   ,2022-01-05 18:00:00 ,050
BBBB     ,5678   ,2022-02-07 18:00:00 ,050

Do I need to cross join and lag, if so how can I accomplish that? Is there a way to accomplish this that's optimally efficient beyond some type of cross join?

Greg Pavlik · Accepted Answer

This is an alternate approach to using a window function. It's a subquery to find the closest date with a self-join. This approach can perform better than a window function, especially if there's a known limitation in date ranges. For example, this is looking for the closest change backward in time, but if it's known that the change would never be more than N days in the past, adding that to the subquery can help performance a lot.

create table CHARGE(Charge_ID string, User_ID int, Inserted_At timestamp_ntz);
insert into charge (Charge_ID,User_ID,Inserted_At) values
('AAAA'     ,1234   ,'2022-01-01 15:00:00'),
('AAAA'     ,1234   ,'2022-02-01 15:00:00'),
('BBBB'     ,5678   ,'2022-01-05 18:00:00'),
('BBBB'     ,5678   ,'2022-02-07 18:00:00')
;

create table PLAN(User_ID int,Plan_ID int ,Inserted_At timestamp_ntz);
insert into PLAN(User_ID,Plan_ID ,Inserted_At)  values
(1234   ,100    ,'2022-01-01 13:00:00'),
(1234   ,099    ,'2022-01-01 14:00:00'),
(1234   ,101    ,'2022-01-18 13:00:00'),
(5678   ,050    ,'2022-01-04 13:00:00'),
(5678   ,051    ,'2022-02-08 13:00:00');

with X as
(
select   CHARGE_ID
        ,USER_ID
        ,INSERTED_AT CHARGE_INSERTED_AT
        ,(select max(P.INSERTED_AT) from PLAN P 
          where C.USER_ID = P.USER_ID
          and C.INSERTED_AT >= P.INSERTED_AT) LAST_INSERTED_AT
from CHARGE C
)
select X.CHARGE_ID, X.USER_ID, X.CHARGE_INSERTED_AT, P.PLAN_ID
from X left join PLAN P 
  on X.USER_ID = P.USER_ID and X.LAST_INSERTED_AT = P.INSERTED_AT

How to perform SQL join on lag date in Snowflake?

Answers (2)

Related Questions