Reputation: 17
I am trying to figure out the number of days something occurred for each "event type." For example, I have several users, several event types and dates. I want to add a column for the "Days Since Last Event" (see image). What is the SQL syntax for this (specifically Snowflake)? The image shows exactly what I am trying to do but I created that example in Excel.
Upvotes: 1
Views: 655
Reputation: 25928
Using NVL, FIRST_VALUE, LAG, DATEDIFF, DATEADD, IFF
a long winded set of steps looks like:
SELECT
user,
type,
dates,
lead(type)over(partition by user order by dates) as lead_event,
iff(lead_event != type, dates, null) as diff_event_date,
first_value(dates)over(partition by user order by dates) as first_user_date
lag(diff_event_date) ignore nulls over (partition by user order by dates) as last_event_date,
NVL(last_event_date, dateadd(day,-1,first_user_date)) AS date_of_true_interest
datediff(day, date_of_true_interest, dates) as "days since last event"
FROM data
ORDER BY 1,3
but that will not work. so if we need to shuffle the second lag into a second stage of logic. thus:
SELECT
user,
type,
dates,
lag(diff_event_date) ignore nulls over (partition by user order by dates) as last_event_date,
NVL(last_event_date, dateadd(day,-1,first_user_date)) AS date_of_true_interest,
datediff(day, date_of_true_interest, dates) as "days since last event"
FROM (
SELECT
user,
type,
dates,
lead(type)over(partition by user order by dates) as lead_event,
iff(lead_event != type, dates, null) as diff_event_date,
first_value(dates)over(partition by user order by dates) as first_user_date
FROM data
)
ORDER BY 1,3;
and that's ugly, so another layer to tidy up.
SELECT
user,
type,
dates,
datediff(day, date_of_true_interest, dates) as "days since last event"
FROM (
SELECT
user,
type,
dates,
lag(diff_event_date) ignore nulls over (partition by user order by dates) as last_event_date,
NVL(last_event_date, dateadd(day,-1,first_user_date)) AS date_of_true_interest
FROM (
SELECT
user,
type,
dates,
lead(type)over(partition by user order by dates) as lead_event,
iff(lead_event != type, dates, null) as diff_event_date,
first_value(dates)over(partition by user order by dates) as first_user_date
FROM data
)
)
ORDER BY 1,3;
and mix with some data
WITH data AS (
select * from values
(12345,'active',to_date('1/15/21', 'MM/DD/YY')),
(12345,'active',to_date('1/16/21', 'MM/DD/YY')),
(12345,'active',to_date('1/17/21', 'MM/DD/YY')),
(12345,'dormant',to_date('1/18/21', 'MM/DD/YY')),
(12345,'dormant',to_date('1/19/21', 'MM/DD/YY')),
(12345,'churned',to_date('1/20/21', 'MM/DD/YY')),
(12345,'churned',to_date('1/21/21', 'MM/DD/YY')),
(12345,'churned',to_date('1/22/21', 'MM/DD/YY')),
(39498,'active',to_date('1/15/21', 'MM/DD/YY')),
(39498,'active',to_date('1/16/21', 'MM/DD/YY')),
(39498,'dormant',to_date('1/17/21', 'MM/DD/YY')),
(39498,'churned',to_date('1/18/21', 'MM/DD/YY'))
v( user, type, dates)
)
we get the results:
USER | TYPE | DATES | days since last event |
---|---|---|---|
12345 | active | 2021-01-15 | 1 |
12345 | active | 2021-01-16 | 2 |
12345 | active | 2021-01-17 | 3 |
12345 | dormant | 2021-01-18 | 1 |
12345 | dormant | 2021-01-19 | 2 |
12345 | churned | 2021-01-20 | 1 |
12345 | churned | 2021-01-21 | 2 |
12345 | churned | 2021-01-22 | 3 |
39498 | active | 2021-01-15 | 1 |
39498 | active | 2021-01-16 | 2 |
39498 | dormant | 2021-01-17 | 1 |
39498 | churned | 2021-01-18 | 1 |
Upvotes: 0
Reputation: 751
In the comment "if event type hasn't changed, take the count + 1, if the event type has changed, start count over at 1", this is important statement and can be translated as a window function on event type and then using row_number
function.
There's an interesting window function called conditional_change_event
https://docs.snowflake.com/en/sql-reference/functions/conditional_change_event.html
which will help with the changes of current row vs previous row.
Note that there's an order by
clause in conditional_change_event
which makes us introduce an autoincrement
id column. That said let's get into action with some sql queries
Table creation with sample values:
create or replace temporary table _temp (
id int autoincrement,
user int,
_type varchar,
dates date
);
insert into _temp(user, _type, dates)
values (12345,'active',to_date('1/15/21', 'MM/DD/YY')),
(12345,'active',to_date('1/16/21', 'MM/DD/YY')),
(12345,'active',to_date('1/17/21', 'MM/DD/YY')),
(12345,'dormant',to_date('1/18/21', 'MM/DD/YY')),
(12345,'dormant',to_date('1/19/21', 'MM/DD/YY')),
(12345,'churned',to_date('1/20/21', 'MM/DD/YY')),
(12345,'churned',to_date('1/21/21', 'MM/DD/YY')),
(12345,'churned',to_date('1/22/21', 'MM/DD/YY')),
(39498,'active',to_date('1/15/21', 'MM/DD/YY')),
(39498,'active',to_date('1/16/21', 'MM/DD/YY')),
(39498,'dormant',to_date('1/17/21', 'MM/DD/YY')),
(39498,'churned',to_date('1/18/21', 'MM/DD/YY'));
Goal
with event_changes as (
select *,
conditional_change_event(_type) over (order by id) as type_changes
from _temp
)
select *,
row_number() over(partition by _type, type_changes order by id) as days_since_last_event
from event_changes
order by id;
Output
ID USER _TYPE DATES TYPE_CHANGES DAYS_SINCE_LAST_EVENT
1 12345 active 2021-01-15 0 1
2 12345 active 2021-01-16 0 2
3 12345 active 2021-01-17 0 3
4 12345 dormant 2021-01-18 1 1
5 12345 dormant 2021-01-19 1 2
6 12345 churned 2021-01-20 2 1
7 12345 churned 2021-01-21 2 2
8 12345 churned 2021-01-22 2 3
9 39498 active 2021-01-15 3 1
10 39498 active 2021-01-16 3 2
11 39498 dormant 2021-01-17 4 1
12 39498 churned 2021-01-18 5 1
Upvotes: 0