Rallad
Rallad

Reputation: 101

Grouping data by overlapping date selections in Oracle SQL

I was wondering if there is an efficient way to group data specific overlapping time frames. My main goal in mastering this exercise is to query this table:

transaction_date  | item_type | profit
  'dd/mm/yy'            ""        ''

example:

transaction_date  | item_type | profit
  03/01/2018           Book        2
  03/05/2018           Book        5
  04/05/2018           Book        6
  03/06/2018           Misc.       20

And obtain

(1) A grouping by weeks, and years so the query has the columns year | date

This was already done successfully by simply using group by with the following lines

to_char(transaction_date, 'YY'), to_char(transaction_date, 'WW')

which, along some other desired grouping for item_type and profit, gives

year | week  | item_type | profit
 18     17        Book        2
 18     18        Book       11
 18     22        Misc       20

(2) An additional column, that has the profit for the last 4 weeks (if an entry is in week 7, this will be the totals for weeks 3-6, rather than exactly 4 weeks ago.)

year | week  | item_type | profit  | profit_last4wks
 18     17        Book        2           0
 18     18        Book       11           2
 18     22        Misc       20           0

This step is the problem, since I am currently very puzzled on how to approach it, and where to start as a first step. Any suggestions, or a point in the right direction would be very helpful.

I had the idea to do steps (1) and (2) in separate queries and join then, but even obtaining (2) separately has me stumped. I also thought that it might be possible to obtain (1) first, then append the desired column in some way.

Upvotes: 0

Views: 332

Answers (2)

Thorsten Kettner
Thorsten Kettner

Reputation: 94859

In order to add up the previous weeks, you must be able to calculate which week is before a given week. By your definition (i.e. merging weeks 52 and 53) the week 2018/52 is the week before 2019/01. This calculation is best done with a running week number. Starting with year 0 week 1 as 1, the first week in 2019 is the 104989th week for instance and the last week in 2018 is the 104988th.

You want to show a week's profit and also the four previous weeks' profit sum. This would be easy with SUM OVER, but there can be gaps (missing weeks), so we don't know how many rows we must get back. Example:

year | week | profit | last4wks
-----+------+--------+---------
2018 | 52   |     10 |        0
2019 | 01   |     20 |       10 (2018/49 bis 2018/52 => 1 row)
2019 | 02   |     30 |       30 (2018/50 bis 2019/01 => 2 rows)
2019 | 06   |     40 |       30 (2019/02 bis 2019/05 => 1 row)

I'm solving this with LAG in order to see whether rows -1, -2, -3, and -4 are in the desired range of the four previous weeks. Then I use LAG again to get the weeks' profits. Maybe there exists a more elegant way that just doesn't come to mind right now.

with weekdata as
(
  select
    extract(year from transaction_date) * 52 +
    to_number(to_char(transaction_date, 'ww')) as runweek,
    profit
  from mytable
)
select
  trunc(runweek / 52) as year,
  mod(runweek, 52) as week,
  sum(profit),
  case when lag(runweek, 1) over (order by runweek) between runweek - 4 and runweek - 1 then
    lag(sum(profit), 1) else 0 end +
  case when lag(runweek, 2) over (order by runweek) between runweek - 4 and runweek - 1 then
    lag(sum(profit), 2) else 0 end +
  case when lag(runweek, 3) over (order by runweek) between runweek - 4 and runweek - 1 then
    lag(sum(profit), 3) else 0 end +
  case when lag(runweek, 4) over (order by runweek) between runweek - 4 and runweek - 1 then
    lag(sum(profit), 4) else 0 end as profit_last4wks
from weekdata
group by runweek
order by runweek;

The query doesn't take the type into account, as it's not clear to me how you want it be dealt with. Maybe you want one result row per week as shown and just add the list of affected types (LISTAGG DISTINCT). Or you want one result row per week and type, for which you'd have to add the type to the GROUP BY clause and add PARTITION clauses.

Upvotes: 1

gsalem
gsalem

Reputation: 2028

As mathguy said, you're grouping by the 2 to_char function, plus the item_type. To get the profit_last4weeks data, you need to use the analytical version of the SUM function, look at the doc for the analytical clause and the SUM

Upvotes: 1

Related Questions