Reputation: 1544
I am trying to see how the cumulative number of subscribers changed over time based on unique email addresses and date they were created. Below is an example of a table I am working with.
I am trying to turn it into the table below. Email [email protected] was created twice and I would like to count it once. I cannot figure out how to generate the Running count distinct column.
Thanks for the help.
Upvotes: 2
Views: 5136
Reputation: 1269503
I would usually do this using row_number()
:
select date, count(*),
sum(count(*)) over (order by date),
sum(sum(case when seqnum = 1 then 1 else 0 end)) over (order by date)
from (select t.*,
row_number() over (partition by email order by date) as seqnum
from t
) t
group by date
order by date;
This is similar to the version using lag()
. However, I get nervous using lag if the same email appears multiple times on the same date.
Upvotes: 3
Reputation: 222402
Here is a solution that does not uses sum over
, neither lag
... And does produces the correct results.
Hence it could appear as simpler to read and to maintain.
select
t1.date_created,
(select count(*) from my_table where date_created = t1.date_created) emails_created,
(select count(*) from my_table where date_created <= t1.date_created) cumulative_sum,
(select count( distinct email) from my_table where date_created <= t1.date_created) running_count_distinct
from
(select distinct date_created from my_table) t1
order by 1
Upvotes: 0
Reputation: 49260
Getting the total count and cumulative count is straight forward. To get the cumulative distinct count, use lag
to check if the email had a row with a previous date, and set the flag to 0 so it would be ignored during a running sum.
select distinct dt
,count(*) over(partition by dt) as day_total
,count(*) over(order by dt) as cumsum
,sum(flag) over(order by dt) as cumdist
from (select t.*
,case when lag(dt) over(partition by email order by dt) is not null then 0 else 1 end as flag
from tbl t
) t
Upvotes: 0