Reputation: 5295
I have a large table of (say) users and their visits to various webpages. An example of the table could be:
schema.page_views
date| user_id| webpage|
----------+---------+----------------+
01/05/2012|aaaaaaaaa|www.example1.com|
01/06/2012|aaaaaaaaa|www.example2.com|
01/06/2012|ababababa|www.example1.com|
...
01/05/2013|aaaaaaaaa|www.example2.com|
01/05/2013|ababababa|www.example1.com|
03/05/2013|aaaaaaaaa|www.example2.com|
Additionally, I have a calendar lookup table:
schema.weeks
week_number| begin_date| end_date|year|
-----------+-----------+-----------+----+
18| 2012-04-23| 2012-04-30|2012|
17| 2012-04-15| 2012-04-22|2012|
16| 2012-04-08| 2012-04-14|2012|
What I would ultimately like from these two tables is a trailing three week total of page views. More specifically, if I was issuing this query today, I would like something along the lines of:
webpage| 1-3| 4-6| 7-9| 10-12|
----------------+-------+-------+-------+-------+
www.example1.com| 124875| 175682| 168542| 159780|
www.example2.com| three week period numbers |
where '1-3' refers to the most recent three weeks, '4-6' refers to four to six weeks ago, etc.
So far, I've obtained the weekly total page views for each website and week, as follows:
-- join to get begin and end_dates
with t1 as (
select a.*, b.* from schema.page_views a
join schema.weeks b
on a.date between b.begin_date and b.end_date),
-- now aggregate per week
weekly_vol as (
select webpage, year, week_number, begin_date, end_date,
count(*) volume
from t1
group by 1, 2, 3, 4, 5)
select * from weekly_vol limit 1;
gives, for example,
webpage| year| week_number| begin_date| end_date| volume|
----------------+-----+------------+-----------+-----------+-------+
www.example1.com| 2012| 18| 2012-04-23| 2012-04-30| 41871|
With a table like this, how can I get to the table I would like? I have thought of joining it on itself with something like on a.week_number between b.week_number and b.week_number + 2
but I don't think this idea will work when the last twelve weeks include the start of a new year.
Has anyone done something similar, and if so, how did you approach the problem? Suggestions for a different approach would be appreciated equally.
Upvotes: 0
Views: 708
Reputation: 1269883
Your question is a bit unclear. Are week numbers just within a year or over all time? Do you want the sums relative to the last week in the data or relative to the current date?
In any case, the following does something that is at least similar to what you want. The key is the conditional aggregation in the outer query:
with t1 as (
select pv.*, w.*,
max(pv.week_number) over (partition by webpage, year) as last_week_number
from schema.page_views pv join
schema.weeks w
on pv.date between w.begin_date and w.end_date
),
weekly_vol as (
select webpage, year, week_number, begin_date, end_date,
count(*) as volume
from t1
group by webpage, year, week_number, begin_date, end_date
)
select webpage, year,
sum(case when last_week_number - week_number < 3
then volume else 0
end) as week_1_3,
sum(case when last_week_number - week_number >= 3 and last_week_number - week_number < 6
then volume else 0
end) as week_4_6
from weekly_vol
group by webpage, year;
Upvotes: 1