lareven
lareven

Reputation: 379

oracle sql: efficient way to calculate business days in a month

I have a pretty huge table with columns dates, account, amount, etc. eg.

date        account       amount
4/1/2014    XXXXX1        80
4/1/2014    XXXXX1        20
4/2/2014    XXXXX1        840  
4/3/2014    XXXXX1        120
4/1/2014    XXXXX2        130
4/3/2014    XXXXX2        300
...........

(I have 40 months' worth of daily data and multiple accounts.)

The final output I want is the average amount of each account each month. Since there may or may not be record for any account on a single day, and I have a seperate table of holidays from 2011~2014, I am summing up the amount of each account within a month and dividing it by the number of business days of that month. Notice that there is very likely to be record(s) on weekends/holidays, so I need to exclude them from calculation. Also, I want to have a record for each of the date available in the original table. eg.

date        account       amount
4/1/2014    XXXXX1        48 ((80+20+840+120)/22)
4/2/2014    XXXXX1        48  
4/3/2014    XXXXX1        48
4/1/2014    XXXXX2        19 ((130+300)/22)
4/3/2014    XXXXX2        19
...........

(Suppose the above is the only data I have for Apr-2014.)

I am able to do this in a hacky and slow way, but as I need to join this process with other subqueries, I really need to optimize this query. My current code looks like:

<!-- language: lang-sql -->  

select
   date,
   account,
   sum(amount/days_mon) over (partition by last_day(date))
from(
     select
        date,
        -- there are more calculation to get the account numbers,
        -- so this subquery is necessary
        account, 
        amount,
        -- this is a list of month-end dates that the number of 
        -- business days in that month is 19. similar below.

        case when last_day(date) in ('','',...,'') then 19                  
             when last_day(date) in ('','',...,'') then 20
             when last_day(date) in ('','',...,'') then 21
             when last_day(date) in ('','',...,'') then 22
             when last_day(date) in ('','',...,'') then 23
        end as days_mon
      from mytable tb
      inner join lookup_businessday_list busi
        on tb.date = busi.date)

So how can I perform the above purpose efficiently? Thank you!

Upvotes: 3

Views: 1375

Answers (2)

APC
APC

Reputation: 146239

This approach uses sub-query factoring - what other RDBMS flavours call common table expressions. The attraction here is that we can pass the output from one CTE as input to another. Find out more.

The first CTE generates a list of dates in a given month (you can extend this over any range you like).

The second CTE uses an anti-join on the first to filter out dates which are holidays and also dates which aren't weekdays. Note that Day Number varies depending according to the NLS_TERRITORY setting; in my realm the weekend is days 6 and 7 but SQL Fiddle is American so there it is 1 and 7.

with dates as ( select date '2014-04-01' + ( level -  1) as d
               from dual
               connect by level <= 30 )
     , bdays as ( select d
                         , count(d) over () tot_d 
                  from   dates
                         left join holidays
                              on dates.d = holidays.hol_date
                  where holidays.hol_date is null   
                  and   to_number(to_char(dates.d, 'D')) between 2 and 6 
                )
select yt.account
       , yt.txn_date
       , sum(yt.amount) over (partition by yt.account, trunc(yt.txn_date,'MM'))
            /tot_d as avg_amt
from your_table yt
     join bdays 
          on bdays.d = yt.txn_date
order by yt.account
       , yt.txn_date
/

I haven't rounded the average amount.

Upvotes: 2

vav
vav

Reputation: 4694

You have 40 month of data, this data should be very stable. I will assume that you have a cold body (big and stable easily definable range of data) and hot tail (small and active part).

Next, I would like to define a minimal period. It is a data range that is a smallest interval interesting for Business. It might be year, month, day, hour, etc. Do you expect to get questions like "what was averege for that account between 1900 and 12am yesterday?". I will assume that the answer is DAY.

Then,

  1. I will calculate sum(amount) and count() for every account for every DAY of cold body.
  2. I will not create a dummy records, if particular account had no activity on some day.
  3. and I will save day, account, total amount, count in a TABLE.

if there are modifications later to the cold body, you delete and reload affected day from that table.

For hot tail there might be multiple strategies:

  1. Do the same as above (same process, clear to support)
  2. always calculate on a fly
  3. use materialized view as an averege between 1 and 2.

Cold body table totalc could also be implemented as materialized view, but if data never change - no need to rebuild it.

With this you go from (number of account) x (number of transactions per day) x (number of days) to (number of account)x(number of active days) number of records.

That should speed up all following calculations.

Upvotes: 2

Related Questions