Reputation: 327
I am new to SQL, and I have a dataset with date values and domain column. Domain column only contains values 'personal' and 'business'. What I am trying to accomplish is to calculate weekly and monthly rolling count for each domain type.
What I thought of doing is creating 2 separate columns - is_personal and is_business - with values 1 for rows where domain_type is of appropriate value. For example, if domain_type is 'personal', then there would be value of 1 in is_personal column. Otherwise, 1 would be in the row of is_business. Then, I was going to calculate the rolling sum.
However, I wonder if I can avoid creating additional columns and perform weekly and monthly rolling count straight from the string column in Google Big Query.
What I tried so far is "grouping by" date column using DATE_TRUNC(CAST(created_at AS date), ISOWEEK)
to 'roll-up' dates by week periods. When I try any rolling functions on the domain_type column, I get many errors. Some are related to trying functions not recognizable by Google Big Query, some related to the fact that I am working with the string column, and so on.
The ultimate goal I am trying to achieve is calculate weekly and monthly rolling count of 'business' and 'personal' domain types. Please let me know if I can provide additional information that will help. Thank you!
Current look of data:
Date domain_type
2017-10-02 personal
2017-10-03 business
2017-10-04 personal
2017-10-05 business
2017-10-06 personal
2017-10-07 business
2017-10-08 personal
2017-10-09 business
2017-10-10 personal
2017-10-11 business
2017-10-12 personal
2017-10-13 business
2017-10-14 personal
2017-10-15 business
Let's say that during the week of 2017-10-02, a total of 10 users signed up with personal email address and a total of 20 signed up with business email address. During the week of 2017-10-09, a total of 25 signed up with personal email and 30 signed up with business email. Hence, for 2 weeks, rolling count for personal domain type is 35, and rolling count for business domain type is 50.
The output I am trying to achieve:
Date domain_type rolling_count_for_week
2017-10-02 personal 10
2017-10-02 business 20
2017-10-09 personal 35
2017-10-09 business 50
Upvotes: 0
Views: 1090
Reputation: 173121
Below is for BigQuery Standard SQL
#standardSQL
SELECT Date, domain_type,
SUM(IF(domain_type = 'personal', personal, business)) AS rolling_count_for_week
FROM (
SELECT Date, type AS domain_type,
SUM(IF(domain_type = 'personal' AND domain_type = type, 1, 0)) OVER(ORDER BY Date) personal,
SUM(IF(domain_type = 'business' AND domain_type = type, 1, 0)) OVER(ORDER BY Date) business
FROM `project.dataset.table`,
UNNEST(['personal', 'business']) type
)
WHERE EXTRACT(DAYOFWEEK FROM Date) = 2
GROUP BY Date, domain_type
if to apply to sample data from your question - output is
Row Date domain_type rolling_count_for_week
1 2017-10-02 personal 1
2 2017-10-02 business 0
3 2017-10-09 personal 4
4 2017-10-09 business 4
What if, for one particular week, there is no data on dow=2 but there is data for the other days?
Good point, somehow I assumed that at least one entry per day will exist :o)
See version below that does not have this dependency
#standardSQL
WITH calendar_type AS (
SELECT Date, type
FROM (
SELECT MIN(Date) min_date, MAX(Date) max_date
FROM `project.dataset.table`
), UNNEST(GENERATE_DATE_ARRAY(min_date, max_date)) Date,
UNNEST(['personal', 'business']) type
)
SELECT Date, domain_type,
SUM(IF(domain_type = 'personal', personal, business)) AS rolling_count_for_week
FROM (
SELECT c.Date, type AS domain_type,
SUM(IF(domain_type = 'personal' AND domain_type = type, 1, 0)) OVER(ORDER BY c.Date) personal,
SUM(IF(domain_type = 'business' AND domain_type = type, 1, 0)) OVER(ORDER BY c.Date) business
FROM calendar_type c
LEFT JOIN `project.dataset.table` t
ON c.Date = t.Date AND c.type = t. domain_type
)
WHERE EXTRACT(DAYOFWEEK FROM Date) = 2
GROUP BY Date, domain_type
Upvotes: 1
Reputation: 86775
WITH
weekly AS
(
SELECT
DATE_TRUNC(CAST(created_at AS date), ISOWEEK) AS created_week,
*
FROM
yourData
)
SELECT
created_week,
domain_type,
SUM(COUNT(*)) OVER (PARTITION BY domain_type ORDER BY created_week) AS cumulative_emails
FROM
weekly
GROUP BY
created_week,
domain_type
Upvotes: 1
Reputation: 1270713
If you want the number of distinct values in a week, use aggregation:
select date_trunc(date, week) as wk, email_type,
count(*) -- or count(distinct email) if they are not already unique
from t
group by wk, email_type
order by 1, 2;
I don't see anything "rolling" about what you are trying to do -- unless, perhaps, you want the count for two consecutive weeks. If that is the case, use window functions:
select date_trunc(date, week) as wk, email_type,
count(*) as this_week,
sum(count(*)) over (partition by email_type order by date_trunc(date, wk) rows between 1 preceding and current row) as 2_week_count
from t
group by wk, email_type
order by 1, 2;
Upvotes: 1