Reputation: 101
I was wondering if there is an efficient way to group data specific overlapping time frames. My main goal in mastering this exercise is to query this table:
transaction_date | item_type | profit
'dd/mm/yy' "" ''
example:
transaction_date | item_type | profit
03/01/2018 Book 2
03/05/2018 Book 5
04/05/2018 Book 6
03/06/2018 Misc. 20
And obtain
(1) A grouping by weeks, and years so the query has the columns year | date
This was already done successfully by simply using group by with the following lines
to_char(transaction_date, 'YY'), to_char(transaction_date, 'WW')
which, along some other desired grouping for item_type and profit, gives
year | week | item_type | profit
18 17 Book 2
18 18 Book 11
18 22 Misc 20
(2) An additional column, that has the profit for the last 4 weeks (if an entry is in week 7, this will be the totals for weeks 3-6, rather than exactly 4 weeks ago.)
year | week | item_type | profit | profit_last4wks
18 17 Book 2 0
18 18 Book 11 2
18 22 Misc 20 0
This step is the problem, since I am currently very puzzled on how to approach it, and where to start as a first step. Any suggestions, or a point in the right direction would be very helpful.
I had the idea to do steps (1) and (2) in separate queries and join then, but even obtaining (2) separately has me stumped. I also thought that it might be possible to obtain (1) first, then append the desired column in some way.
Upvotes: 0
Views: 332
Reputation: 94859
In order to add up the previous weeks, you must be able to calculate which week is before a given week. By your definition (i.e. merging weeks 52 and 53) the week 2018/52 is the week before 2019/01. This calculation is best done with a running week number. Starting with year 0 week 1 as 1, the first week in 2019 is the 104989th week for instance and the last week in 2018 is the 104988th.
You want to show a week's profit and also the four previous weeks' profit sum. This would be easy with SUM OVER
, but there can be gaps (missing weeks), so we don't know how many rows we must get back. Example:
year | week | profit | last4wks -----+------+--------+--------- 2018 | 52 | 10 | 0 2019 | 01 | 20 | 10 (2018/49 bis 2018/52 => 1 row) 2019 | 02 | 30 | 30 (2018/50 bis 2019/01 => 2 rows) 2019 | 06 | 40 | 30 (2019/02 bis 2019/05 => 1 row)
I'm solving this with LAG
in order to see whether rows -1, -2, -3, and -4 are in the desired range of the four previous weeks. Then I use LAG
again to get the weeks' profits. Maybe there exists a more elegant way that just doesn't come to mind right now.
with weekdata as
(
select
extract(year from transaction_date) * 52 +
to_number(to_char(transaction_date, 'ww')) as runweek,
profit
from mytable
)
select
trunc(runweek / 52) as year,
mod(runweek, 52) as week,
sum(profit),
case when lag(runweek, 1) over (order by runweek) between runweek - 4 and runweek - 1 then
lag(sum(profit), 1) else 0 end +
case when lag(runweek, 2) over (order by runweek) between runweek - 4 and runweek - 1 then
lag(sum(profit), 2) else 0 end +
case when lag(runweek, 3) over (order by runweek) between runweek - 4 and runweek - 1 then
lag(sum(profit), 3) else 0 end +
case when lag(runweek, 4) over (order by runweek) between runweek - 4 and runweek - 1 then
lag(sum(profit), 4) else 0 end as profit_last4wks
from weekdata
group by runweek
order by runweek;
The query doesn't take the type into account, as it's not clear to me how you want it be dealt with. Maybe you want one result row per week as shown and just add the list of affected types (LISTAGG DISTINCT
). Or you want one result row per week and type, for which you'd have to add the type to the GROUP BY
clause and add PARTITION
clauses.
Upvotes: 1
Reputation: 2028
As mathguy said, you're grouping by the 2 to_char function, plus the item_type. To get the profit_last4weeks data, you need to use the analytical version of the SUM function, look at the doc for the analytical clause and the SUM
Upvotes: 1