Reputation: 12292
Given an event
table, I want to find the count of distinct user_id
s for a specific event type
in four different windows:
I came up with this idea of a query (non-working), which probably shows best what I try to accomplish:
select
d as "Date",
count(distinct user_id) over (
order by d
rows between 365 preceding and current row
) "Yearly Active",
count(distinct user_id) over (
order by d
rows between 30 preceding and current row
) "Monthly Active",
count(distinct user_id) over (
order by d
rows between 7 preceding and current row
) "Weekly Active",
count(distinct user_id) as "Daily Active"
from (
select
date(to_timestamp(event_time/1000) at time zone 'Europe/Berlin') d,
user_id
from event_entity
where type = 'REFRESH_TOKEN'
group by 1, user_id
order by d
) daily_users
But we cannot use distinct
in window functions. What is a better approach to the query?
Upvotes: 0
Views: 114
Reputation: 661
The best approach to this query is not using window functions, you can get the data you need just by using non-windowed count
and case when
statements.
First get all distinct date and user_id values in a CTE and all distinct dates in another CTE
WITH daily_users AS(
select distinct
date(to_timestamp(event_time/1000) at time zone 'Europe/Berlin') d,
user_id
from event_entity
where type = 'REFRESH_TOKEN'
),
all_dates AS(
select distinct
date(to_timestamp(event_time/1000) at time zone 'Europe/Berlin') d
from event_entity
where type = 'REFRESH_TOKEN'
)
Join the dates with the daily_users cte, then use the a CASE
statement inside the COUNT
, replacing user_id
with NULL
for all dates outside the relevant range:
select all_dates.d as "date",
--daily
count(distinct case when all_dates.d = daily_users.d then daily_users.user_id else null end) as daily_active,
--weekly
count(distinct case when all_dates.d - daily_users.d < 7 then daily_users.user_id else null end) as weekly_active,
--30 days
count(distinct case when all_dates.d - daily_users.d < 30 then daily_users.user_id else null end) as monthly_active,
--yearly, just count distinct user_id because the 365 day condition is already provided by the join
count(distinct daily_users.user_id) as yearly_active
from all_dates
inner join daily_users on all_dates.d >= daily_users.d and (all_dates.d - daily_users.d < 365) --filter only rows between the date in all_dates and one year before
group by all_dates.d
Some notes:
count
ignores null values, that's why the case
expressions set user_id to null in all rows outside the relevant periodUpvotes: 1
Reputation: 3183
Edited
If you want to use the windows function, you can use a trick. You can use the bellow formula to remove duplicate user_id
:
abs(
dense_rank() over (order by d, user_id)
- dense_rank() over (order by d, user_id DESC)
+ 1
)
But I don't know the result of the query it is the same you expected.
Please check this structure. Is this query has resulted in what you expect?
select distinct on (d)
d,
----
abs(
dense_rank() over asc_total_preceding
- dense_rank() over desc_total_preceding
) + 1 as "Daily Active",
----
abs(
dense_rank() over asc_7_preceding
- dense_rank() over desc_7_preceding
) + 1 as "Weekly Active",
----
abs(
dense_rank() over asc_30_preceding
- dense_rank() over desc_30_preceding
) + 1 as "Monthly Active",
----
abs(
dense_rank() over asc_365_preceding
- dense_rank() over desc_365_preceding
) + 1 as "Yearly Active"
from (
select
date(to_timestamp(event_time/1000) at time zone 'Europe/Berlin') d,
user_id
from event_entity
where type = 'REFRESH_TOKEN'
group by 1, user_id
order by d
) daily_users
window
asc_total_preceding as (order by d, user_id),
desc_total_preceding as (order by d, user_id),
asc_7_preceding as (order by d, user_id rows between 7 preceding and current row),
desc_7_preceding as (order by d, user_id desc rows between 7 preceding and current row),
asc_30_preceding as (order by d, user_id rows between 30 preceding and current row),
desc_30_preceding as (order by d, user_id desc rows between 30 preceding and current row),
asc_365_preceding as (order by d, user_id rows between 365 preceding and current row),
desc_365_preceding as (order by d, user_id desc rows between 365 preceding and current row)
P.S (Old query)
In my opinion, better to change the query structure and use group by
and filter
to calculate the count of users on each date
select
d as "Date",
count(distinct user_id) filter (
where d between current_date - 365 and current_date
) as "Yearly Active",
count(distinct user_id) filter (
where d between current_date - 30 and current_date
) as "Monthly Active",
count(distinct user_id) filter (
where d between current_date - 7 and current_date
) as "Weekly Active",
count(distinct user_id) filter (
where d = current_date
) as "Daily Active"
from (
select
date(to_timestamp(event_time/1000) at time zone 'Europe/Berlin') d,
user_id
from event_entity
where type = 'REFRESH_TOKEN'
group by 1, user_id
order by d
) daily_users
group by d
Upvotes: 0