Reputation: 1095
I have a table with charges and user id's and a separate table with plan changes. I need to join these tables in such a way that I know what the user's plan id was at the time of the charge in Snowflake.
Charge Table:
Charge_ID,User_ID,Inserted_At
AAAA ,1234 ,2022-01-01 15:00:00
AAAA ,1234 ,2022-02-01 15:00:00
BBBB ,5678 ,2022-01-05 18:00:00
BBBB ,5678 ,2022-02-07 18:00:00
Plan Table:
User_ID,Plan_ID,Inserted_At
1234 ,100 ,2022-01-01 13:00:00
1234 ,099 ,2022-01-01 14:00:00
1234 ,101 ,2022-01-18 13:00:00
5678 ,050 ,2022-01-04 13:00:00
5678 ,051 ,2022-02-08 13:00:00
Result:
Charge_ID,User_ID,Charge_Inserted_At ,Plan_ID
AAAA ,1234 ,2022-01-01 15:00:00 ,099
AAAA ,1234 ,2022-02-01 15:00:00 ,101
BBBB ,5678 ,2022-01-05 18:00:00 ,050
BBBB ,5678 ,2022-02-07 18:00:00 ,050
Do I need to cross join and lag, if so how can I accomplish that? Is there a way to accomplish this that's optimally efficient beyond some type of cross join?
Upvotes: 1
Views: 628
Reputation: 175586
One way to solve it is to "transpose" single "InsetedAt" to pair date_from
and date_to
using LAG/LEAD and perform range join:
Pseudocode:
WITH ChargeTransposed AS (
SELECT Charge_id, User_ID, Inserted_At AS start_date,
LEAD(Inserted_at,1,'2999-12-31') OVER(PARTITION BY ChargeId, UserId
ORDER BY InsertedAt) AS end_date
FROM ChargeTable
)
SELECT *
FROM ChargeTransposed c
LEFT JOIN PlanTable p
ON c.User_ID = p.User_id
AND p.InsertedAt >= c.start_date AND p.InsetedAt < c.end_date
-- QUALIFTY ROW_NUMBER() OVER(PARTITION BY c.Charge_Id, c.User_Id, c.Start_date
ORDER BY p.InsertAt DESC) = 1
Upvotes: 1
Reputation: 11046
This is an alternate approach to using a window function. It's a subquery to find the closest date with a self-join. This approach can perform better than a window function, especially if there's a known limitation in date ranges. For example, this is looking for the closest change backward in time, but if it's known that the change would never be more than N days in the past, adding that to the subquery can help performance a lot.
create table CHARGE(Charge_ID string, User_ID int, Inserted_At timestamp_ntz);
insert into charge (Charge_ID,User_ID,Inserted_At) values
('AAAA' ,1234 ,'2022-01-01 15:00:00'),
('AAAA' ,1234 ,'2022-02-01 15:00:00'),
('BBBB' ,5678 ,'2022-01-05 18:00:00'),
('BBBB' ,5678 ,'2022-02-07 18:00:00')
;
create table PLAN(User_ID int,Plan_ID int ,Inserted_At timestamp_ntz);
insert into PLAN(User_ID,Plan_ID ,Inserted_At) values
(1234 ,100 ,'2022-01-01 13:00:00'),
(1234 ,099 ,'2022-01-01 14:00:00'),
(1234 ,101 ,'2022-01-18 13:00:00'),
(5678 ,050 ,'2022-01-04 13:00:00'),
(5678 ,051 ,'2022-02-08 13:00:00');
with X as
(
select CHARGE_ID
,USER_ID
,INSERTED_AT CHARGE_INSERTED_AT
,(select max(P.INSERTED_AT) from PLAN P
where C.USER_ID = P.USER_ID
and C.INSERTED_AT >= P.INSERTED_AT) LAST_INSERTED_AT
from CHARGE C
)
select X.CHARGE_ID, X.USER_ID, X.CHARGE_INSERTED_AT, P.PLAN_ID
from X left join PLAN P
on X.USER_ID = P.USER_ID and X.LAST_INSERTED_AT = P.INSERTED_AT
Upvotes: 1