Livewire
Livewire

Reputation: 79

BIGQUERY: How to query for a rolling monthly user active/churn

So I have a website with news articles and I'm trying to calculate 4 user types for each month. The user types are:

1. New User: A user who registers (their first article view) in the current month and viewed an article in the current month.

2. Retained User: A New User from the previous month OR a user who viewed an article in the previous month and in the current month.

3. Churned User: A New User or Retained User from the previous month who has not viewed an article in the current month OR a Churned User from the previous month.

4. Resurrected User: A Churned User from the previous month who has viewed an article in the current month.

**User Table A - Unique User Article Views**
- Current month = 2019-04-01 00:00:00 UTC

| user_id    | viewed_at                 |
------------------------------------------
| 4          | 2019-04-01 00:00:00 UTC   |
| 3          | 2019-04-01 00:00:00 UTC   |
| 2          | 2019-04-01 00:00:00 UTC   |
| 1          | 2019-03-01 00:00:00 UTC   |
| 3          | 2019-03-01 00:00:00 UTC   |
| 2          | 2019-02-01 00:00:00 UTC   |
| 1          | 2019-02-01 00:00:00 UTC   |
| 1          | 2019-01-01 00:00:00 UTC   |


The table above outlines the following user types:

2019-01-01
* User 1: New

2019-02-01
* User 1: Retained
* User 2: New

2019-03-01
* User 1: Retained
* User 2: Churned
* User 3: New

2019-04-01
* User 1: Churned
* User 2: Resurrected
* User 3: Retained
* User 4: New

My desired table COUNTS the distinct user_id for each user type in each month.

| month_viewed_at           | ut_new | ut_retained | ut_churned | ut_resurrected
------------------------------------------------------------------------------------
| 2019-04-01 00:00:00 UTC   | 1      | 1           | 1          | 1
| 2019-03-01 00:00:00 UTC   | 1      | 1           | 1          | 0
| 2019-02-01 00:00:00 UTC   | 1      | 1           | 0          | 0
| 2019-01-01 00:00:00 UTC   | 1      | 0           | 0          | 0 

Upvotes: 2

Views: 1003

Answers (1)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 173121

I simply am not sure where to start

Hope you read all my comments and actually tried something by yourself, but as I don't see any update I suppose you still stuck here - so here we go ...

Below is for BigQuery Standard SQL and should give you direction

#standardSQL
WITH temp1 AS (
  SELECT user_id,
    FORMAT_DATE('%Y-%m', DATE(viewed_at)) month_viewed_at, 
    DATE_DIFF(DATE(viewed_at), '2000-01-01', MONTH) pos,
    DATE_DIFF(DATE(MIN(viewed_at) OVER(PARTITION BY user_id)), '2000-01-01', MONTH) first_pos
  FROM `project.dataset.table`
), temp2 AS (
  SELECT *, pos = first_pos AS new_user
  FROM temp1
), temp3 AS (
  SELECT *, LAST_VALUE(new_user) OVER(win) OR pos - 1 = LAST_VALUE(pos) OVER(win) AS retained_user
  FROM temp2
  WINDOW win AS (PARTITION BY user_id ORDER BY pos RANGE BETWEEN 1 PRECEDING AND 1 PRECEDING)
)
SELECT month_viewed_at,
  COUNTIF(new_user) AS new_users,
  COUNTIF(retained_user) AS retained_users
FROM temp3
GROUP BY month_viewed_at
-- ORDER BY month_viewed_at DESC

If to apply to your sample data - result is

Row month_viewed_at new_users   retained_users   
1   2019-04         1           1    
2   2019-03         1           1    
3   2019-02         1           1    
4   2019-01         1           0    

In temp1 we preparing data by formatting viewed_at to needed format to present in output ad also we are transforming it to present consecutive number of month since some abstract data (2000-02-02) so we can use analytics function with RANGE as opposed to ROWS
In temp2 we just simply identifying new users and in temp3 - retained users

I think, this can be good start, so I am leaving the rest for you

Upvotes: 5

Related Questions