stephanieseayrad
stephanieseayrad

Reputation: 17

How do I use SQL to sequentially number events based on event type?

I am trying to figure out the number of days something occurred for each "event type." For example, I have several users, several event types and dates. I want to add a column for the "Days Since Last Event" (see image). What is the SQL syntax for this (specifically Snowflake)? The image shows exactly what I am trying to do but I created that example in Excel.

Output Goal

Upvotes: 1

Views: 655

Answers (2)

Simeon Pilgrim
Simeon Pilgrim

Reputation: 25928

Using NVL, FIRST_VALUE, LAG, DATEDIFF, DATEADD, IFF

a long winded set of steps looks like:

SELECT 
   user,
   type,
   dates,
   lead(type)over(partition by user order by dates) as lead_event,
   iff(lead_event != type, dates, null) as diff_event_date,
   first_value(dates)over(partition by user order by dates) as first_user_date   
   lag(diff_event_date) ignore nulls over (partition by user order by dates) as last_event_date,
   NVL(last_event_date, dateadd(day,-1,first_user_date)) AS date_of_true_interest
   ​datediff(day, date_of_true_interest, dates) as "days since last event"
FROM data
ORDER BY 1,3

but that will not work. so if we need to shuffle the second lag into a second stage of logic. thus:

SELECT 
   user,
   type,
   dates,
   lag(diff_event_date) ignore nulls over (partition by user order by dates) as last_event_date,
   NVL(last_event_date, dateadd(day,-1,first_user_date)) AS date_of_true_interest,
   datediff(day, date_of_true_interest, dates) as "days since last event"
FROM (
    SELECT 
       user,
       type,
       dates,
       lead(type)over(partition by user order by dates) as lead_event,
       iff(lead_event != type, dates, null) as diff_event_date,
       first_value(dates)over(partition by user order by dates) as first_user_date   
    FROM data
)
ORDER BY 1,3;

and that's ugly, so another layer to tidy up.

SELECT 
   user,
   type,
   dates,
   datediff(day, date_of_true_interest, dates) as "days since last event"
FROM (
    SELECT 
       user,
       type,
       dates,
       lag(diff_event_date) ignore nulls over (partition by user order by dates) as last_event_date,
       NVL(last_event_date, dateadd(day,-1,first_user_date)) AS date_of_true_interest
    FROM (
        SELECT 
           user,
           type,
           dates,
           lead(type)over(partition by user order by dates) as lead_event,
           iff(lead_event != type, dates, null) as diff_event_date,
           first_value(dates)over(partition by user order by dates) as first_user_date
        FROM data
    ) 
)
ORDER BY 1,3;

and mix with some data

WITH data AS (
    select * from values 
        (12345,'active',to_date('1/15/21', 'MM/DD/YY')),
        (12345,'active',to_date('1/16/21', 'MM/DD/YY')),
        (12345,'active',to_date('1/17/21', 'MM/DD/YY')),
        (12345,'dormant',to_date('1/18/21', 'MM/DD/YY')),
        (12345,'dormant',to_date('1/19/21', 'MM/DD/YY')),
        (12345,'churned',to_date('1/20/21', 'MM/DD/YY')),
        (12345,'churned',to_date('1/21/21', 'MM/DD/YY')),
        (12345,'churned',to_date('1/22/21', 'MM/DD/YY')),
        (39498,'active',to_date('1/15/21', 'MM/DD/YY')),
        (39498,'active',to_date('1/16/21', 'MM/DD/YY')),
        (39498,'dormant',to_date('1/17/21', 'MM/DD/YY')),
        (39498,'churned',to_date('1/18/21', 'MM/DD/YY'))
    v( user, type, dates)
)

we get the results:

USER TYPE DATES days since last event
12345 active 2021-01-15 1
12345 active 2021-01-16 2
12345 active 2021-01-17 3
12345 dormant 2021-01-18 1
12345 dormant 2021-01-19 2
12345 churned 2021-01-20 1
12345 churned 2021-01-21 2
12345 churned 2021-01-22 3
39498 active 2021-01-15 1
39498 active 2021-01-16 2
39498 dormant 2021-01-17 1
39498 churned 2021-01-18 1

Upvotes: 0

Praneeth
Praneeth

Reputation: 751

In the comment "if event type hasn't changed, take the count + 1, if the event type has changed, start count over at 1", this is important statement and can be translated as a window function on event type and then using row_number function.

There's an interesting window function called conditional_change_event https://docs.snowflake.com/en/sql-reference/functions/conditional_change_event.html which will help with the changes of current row vs previous row.

Note that there's an order by clause in conditional_change_event which makes us introduce an autoincrement id column. That said let's get into action with some sql queries

Table creation with sample values:

create or replace temporary table _temp (
 id int autoincrement,
 user int,
 _type varchar,
 dates date
);

insert into _temp(user, _type, dates)
values (12345,'active',to_date('1/15/21', 'MM/DD/YY')),
(12345,'active',to_date('1/16/21', 'MM/DD/YY')),
(12345,'active',to_date('1/17/21', 'MM/DD/YY')),
(12345,'dormant',to_date('1/18/21', 'MM/DD/YY')),
(12345,'dormant',to_date('1/19/21', 'MM/DD/YY')),
(12345,'churned',to_date('1/20/21', 'MM/DD/YY')),
(12345,'churned',to_date('1/21/21', 'MM/DD/YY')),
(12345,'churned',to_date('1/22/21', 'MM/DD/YY')),
(39498,'active',to_date('1/15/21', 'MM/DD/YY')),
(39498,'active',to_date('1/16/21', 'MM/DD/YY')),
(39498,'dormant',to_date('1/17/21', 'MM/DD/YY')),
(39498,'churned',to_date('1/18/21', 'MM/DD/YY'));

Goal

with event_changes as (
  select *,
  conditional_change_event(_type) over (order by id) as type_changes
  from _temp
)
select *, 
row_number() over(partition by _type, type_changes order by id) as days_since_last_event
from event_changes
order by id;

Output

  ID    USER    _TYPE   DATES   TYPE_CHANGES    DAYS_SINCE_LAST_EVENT
   1    12345   active  2021-01-15  0   1
   2    12345   active  2021-01-16  0   2
   3    12345   active  2021-01-17  0   3
   4    12345   dormant 2021-01-18  1   1
   5    12345   dormant 2021-01-19  1   2
   6    12345   churned 2021-01-20  2   1
   7    12345   churned 2021-01-21  2   2
   8    12345   churned 2021-01-22  2   3
   9    39498   active  2021-01-15  3   1
  10    39498   active  2021-01-16  3   2
  11    39498   dormant 2021-01-17  4   1
  12    39498   churned 2021-01-18  5   1

Upvotes: 0

Related Questions