Reputation: 1293
In my problem, I want to be able to track whether a state has shifted from 04a. Lapsing - Lowering Engagement
to 03d. Engaged - Very High
after trigger_send_date
has occurred.
I believe a window function is required that checks whether a state is 04a. Lapsing - Lowering Engagement
before trigger_send_date
, and then measures whether that changes after trigger_send_date
is needed , but I can't figure out how to write this. I made a start below, but have difficulty continuing!
Ideally I'd like a new column that is a True/False as to whether that switching has occurred post trigger_send_date
within 31 days of the date occuring.
SELECT
cust_id,
state_date,
trigger_send_date,
total_state,
IF (
total_state IN ("04a. Lapsing - Lowering Engagement"),
True,
False
) as lapse,
-- Trying to write this column
sum(IF ((trigger_send_date >= state_date) & (total_state IN ("04a. Lapsing - Lowering Engagement") , 1, null)) OVER (
PARTITION BY cust_id,
state_date
ORDER BY
state_date
) as lapsed_and_returned_wirthin_31_days
FROM
base
ORDER BY
state_date,
trigger_send_date
Does anyone have any tips to help me write this?
This is what my table looks like with expected result as right-most column if it helps!
Upvotes: 0
Views: 120
Reputation: 2528
Let me preface my answer by saying that I don't have access to spark SQL, so the below is written in MySQL (it would probably work in SQL Server as well). I've had a look at the docs and the window frame should still work, you obviously might need to make some tweaks.
The window frame tells the partition function which rows to look at, by included UNBOUNDED PRECEDING you're telling the function to include every row before the current row, and using UNBOUNDED FOLLOWING you're telling the function to look at every row after the current row.
I tried to include another test, for a customer that was engaged before the trigger date and it seems to work. Obviously if you provided some sample data we could test further.
DROP TABLE IF EXISTS Base;
CREATE TABLE Base
(
cust_id BIGINT,
state_date DATE,
trigger_send_date DATE,
total_state VARCHAR(256)
);
INSERT INTO Base (cust_id,state_date, trigger_send_date, total_state) VALUES
(9177819375032029782,'2022-03-07','2022-03-14','03d. Engaged - Very High'),
(9177819375032029782,'2022-03-13','2022-03-14','04a. Lapsing - Lowering Engagement'),
(9177819375032029782,'2022-03-19','2022-03-14','03d. Engaged - Very High'),
(9177819375032029782,'2022-05-07','2022-03-14','03d. Engaged - Very High'),
(819375032029782,'2022-03-07','2022-03-14','03d. Engaged - Very High'),
(819375032029782,'2022-03-10','2022-03-14','04a. Lapsing - Lowering Engagement'),
(819375032029782,'2022-03-11','2022-03-14','03d. Engaged - Very High'),
(819375032029782,'2022-03-19','2022-03-14','03d. Engaged - Very High'),
(819375032029782,'2022-05-07','2022-03-14','03d. Engaged - Very High');
With LapsedCTE AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY cust_id ORDER BY state_date DESC) AS `RNum`
FROM Base
WHERE state_date <= trigger_send_date
AND LEFT(total_state, 3) IN ('03d','04a')
)
SELECT b.cust_id, b.state_date, b.trigger_send_date, b.total_state,
IF (
b.total_state IN ("04a. Lapsing - Lowering Engagement"),
True,
False
) as lapse,
-- Here we find the MIN engaged date (you can other states if needed) AFTER the trigger date.
-- Then we compare that to the trigger_send_date from the list of customers that were lapsed prior to the trigger_send_date (this will be empty for non-lapsed customers
-- so will default to 0 in our results column
-- Then we do a DATEDIFF between the trigger date and the engaged date, if the value is less than or equal to 31 days, Robert is your Mother's Brother..
IF(DATEDIFF(
MIN(IF(b.state_date > b.trigger_send_date AND LEFT(b.total_state, 3) IN ('03d'), b.state_date, NULL))
OVER (PARTITION BY b.cust_id ORDER BY b.state_date ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING), l.trigger_send_date) <= 31, 1, 0) AS `lapsed_and_returned_wirthin_31_days`
-- Here's some other stuff just to show you the inner working of the above
/*
DATEDIFF(
MIN(IF(b.state_date > b.trigger_send_date AND LEFT(b.total_state, 3) IN ('03d'), b.state_date, NULL))
OVER (PARTITION BY b.cust_id ORDER BY b.state_date ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING), b.trigger_send_date) AS `engaged_time_lag_days`,
MIN(IF(b.state_date > b.trigger_send_date AND LEFT(b.total_state, 3) IN ('03d'), b.state_date, NULL))
OVER (PARTITION BY b.cust_id ORDER BY b.state_date ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS `first_engaged_date_after_trigger`
*/
FROM Base b
LEFT JOIN LapsedCTE l ON l.cust_id = b.cust_id AND l.RNum = 1 AND LEFT(l.total_state, 3) IN ('04a');
It would be possible to remove the CTE if you need, it just makes things a bit cleaner.
Here's a runnable DBFiddle just incase you don't have access to a MySQL database.
Upvotes: 1