user2868900
user2868900

Reputation: 771

Fill in blank dates for rolling average - CTE in Snowflake

I have two tables – activity and purchase

Activity table:

user_id     date      videos_watched
   1     2020-01-02        3
   1     2020-01-04        5
   1     2020-01-07        5

Purchase table:

user_id  purchase_date 
   1       2020-01-01 
   2       2020-02-02

What I would like to do is to get a 30 day rolling average since purchase on how many videos has been watched.

The base query is like this:

    SELECT
    DATEDIFF(DAY, p.purchase_date, a.date) AS day_since_purchase,
    AVG(A.VIDEOS_VIEWED)
    FROM PURCHASE P
    LEFT OUTER JOIN ACTIVITY A ON P.USER_ID = A.USER_ID AND
        A.DATE >= P.PURCHASE_DATE AND A.DATE <= DATEADD(DAY, 30, P.PURCHASE_DATE)
    GROUP BY 1;

However, the Activity table only has records for each day a video has been logged. I would like to fill in the blanks for days a video has not been viewed.

I have started to look into using a CTE like this:

    WITH cte AS (
        SELECT date('2020-01-01') as fdate
        UNION ALL
        SELECT CAST(DATEADD(day,1,fdate) as date)
    FROM cte
    WHERE fdate < date('2020-04-01')
    ) select * from cte 
      cross join purchases p
      left outer join activity a 
      on p.user id = a.user_id 
      and a.fdate = p.purchase_date
      and a.date >= p.purchase_date and a.date <= dateadd(day, 30, p.purchase_date)

The end goal is to have something like this:

days_since_purchase    videos_watched
        1                   3
        2                   0 --CTE coalesce inserted value
        3                   0
        4                   5

Been trying for the last couple of hours to get it right, but still can't really get the hang of it.

Upvotes: 1

Views: 975

Answers (2)

GMB
GMB

Reputation: 222482

You can use a recursive query to generate the 30 days following each purchase, then bring the activity table:

with cte as (
    select 
        purchase_date,
        client_id,
        0 days_since_purchase,
        purchase_date dt
    from purchases 
    union all
    select 
        purchase_date,
        client_id,
        days_since_purchase + 1
        dateadd(day, days_since_purchase + 1, purchase_date)
    from cte
    where days_since_purchase < 30

)
select 
    c.days_since_purchase,
    avg(colaesce(a. videos_watch, 0)) avg_ videos_watch
from cte c
left join activity a
    on  a.client_id = c.client_id
    and a.fdate = c.purchase_date
    and a.date = c.dt
group by c.days_since_purchase

Your question is unclear on whether you have a column in the activity table that stores the purchase date each row relates to. Your query has column fdate but not your sample data. I used that column in the query (without such column, you might end up counting the same activity in different purchases).

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269953

If you want to fill in the gaps in the result set, then I think you should be generating integers rather than dates:

WITH cte AS (
      SELECT 1 as day_since_purchase
      UNION ALL
      SELECT 1 + day_since_purchase
      FROM cte
      WHERE day_since_purchase < 4
     )
SELECT cte.day_since_purchase, COALESCE(avg_videos_viewed, 0)
FROM cte LEFT JOIN
     (SELECT DATEDIFF(DAY, p.purchase_date, a.date) AS day_since_purchase,
             AVG(A.VIDEOS_VIEWED) as avg_videos_viewed
      FROM purchases p JOIN
           activity a 
           ON p.user id = a.user_id AND
              a.fdate = p.purchase_date AND
              a.date >= p.purchase_date AND
              a.date <= dateadd(day, 30, p.purchase_date)
      GROUP BY 1
     ) pa
     ON pa.day_since_purchase = cte.day_since_purchase;

Upvotes: 1

Related Questions