Simon Breton
Simon Breton

Reputation: 2876

conditional running sum

I'm trying to return the number of unique users that converted over time.

So I have the following query:

WITH CTE
As
(
SELECT '2020-04-01' as date,'userA' as user,1 as goals  Union all
SELECT '2020-04-01','userB',0   Union all
SELECT '2020-04-01','userC',0   Union all
SELECT '2020-04-03','userA',1   Union all
SELECT '2020-04-05','userC',1  Union all
SELECT '2020-04-06','userC',0  Union all
SELECT '2020-04-06','userB',0
)
select
  date,
  COUNT(DISTINCT
  IF
    (goals >= 1,
      user,
      NULL)) AS cad_converters
from CTE
group by date

I'm trying to count distinct user but I need to find a way to apply the distinct count to the whole date. I probably need to do something like a cumulative some...

expected result would be something like this

date, goals, total_unique_converted_users
'2020-04-01',1,1
'2020-04-01',0,1
'2020-04-01',0,1
'2020-04-03',1,2
'2020-04-05',1,2
'2020-04-06',0,2
'2020-04-06',0,2

Upvotes: 0

Views: 86

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1270873

I would approach this by tagging when the first goal is scored for each name. Then simply do a cumulative sum:

select cte.* except (seqnum), countif(seqnum = 1) over (order by date)
from (select cte.*,
             (case when goals = 1 then row_number() over (partition by user, goals order by date) end) as seqnum
      from cte
     ) cte;

I realize this can be expressed without the case in the subquery:

select cte.* except (seqnum), countif(seqnum = 1 and goals = 1) over (order by date)
from (select cte.*,
             row_number() over (partition by user, goals order by date) as seqnum
      from cte
     ) cte;

Upvotes: 1

Mikhail Berlyant
Mikhail Berlyant

Reputation: 173171

Below is for BigQuery Standard SQL

#standardSQL
SELECT t.date, t.goals, total_unique_converted_users
FROM `project.dataset.table` t
LEFT JOIN (
  SELECT a.date, 
    COUNT(DISTINCT IF(b.goals >= 1, b.user, NULL)) AS total_unique_converted_users
  FROM `project.dataset.table` a
  CROSS JOIN `project.dataset.table` b
  WHERE a.date >= b.date
  GROUP BY a.date
)
USING(date)   

Upvotes: 1

Related Questions