anti_ml
anti_ml

Reputation: 481

Oracle time based analytics

I have a requirement to calculate summary statistics aggregated by specific custom time periods. Specifically, a restaurant chain is open 24 hours a day. I need to calculate statistics like total sales by period, where periods are "Breakfast", "Lunch", "Dinner" and "Overnight". For this company, the official day for which they track statistics begins after dinner. So the 24 hour period that consititutes an official day starts at 8PM and runs until 8PM CST) the next day. That is one period. Another period is "Overnight" which runs from 8PM to 5:30AM. I put these definitions into a table called "tdef" like so:

drop table tdef cascade constraints 
;

create table tdef 
(
    cd char(3) not null,
    start_ts date not null,
    stop_ts date not null 
)

And then I insert the definitions into the tdef table, stored as dates where the start date always begins on Jan 1 1900, and if it spans across midnight, then it ends on Jan 2 1900. Like so,

insert into tdef (start_ts, stop_ts, cd) 
values
(
to_date('1900/01/01 20:00:00', 'yyyy/mm/dd hh24:mi:ss'),
to_date('1900/01/02 19:59:59', 'yyyy/mm/dd hh24:mi:ss'),
'24H'
);

insert into tdef (start_ts, stop_ts, cd) 
values
(
to_date('1900/01/01 10:30:00', 'yyyy/mm/dd hh24:mi:ss'),
to_date('1900/01/01 13:29:59', 'yyyy/mm/dd hh24:mi:ss'),
'LUN
);

insert into tdef (start_ts, stop_ts, cd) 
values
(
to_date('1900/01/01 15:30:00', 'yyyy/mm/dd hh24:mi:ss'),
to_date('1900/01/02 08:29:59', 'yyyy/mm/dd hh24:mi:ss'),
'ON'
);

I have a very large table (about 2.5 billion rows) which contains all register transactions. I need to summarize sales by date (their defintion of 8PM-8PM), product and time dimension and store this in a table for fast access reporting. The table should look like this:

Dec 12 2011, Hamburger, 24H, 1000
Dec 12 2011, Hamburger, ON, 100
Dec 12 2011, Hamburger, LUN, 400

Here is what I did to accomplish this, I added two date columns to the transaction table which are the time of the transaction on 1/1/1900 and 1/2/1900, like so:

to_date(concat('01/01/1900 ', tran_tm), 'mm/dd/yyyy hh24:mi'),
to_date(concat('01/02/1900 ', tran_tm), 'mm/dd/yyyy hh24:mi')

I indexed these two columns. Then I created a cross look up table that associated transaction ids with time codes. Each transaction code may be in more than one time defintion. So it looks like this:

24H, 1
24H, 2
24H, 3
...
LUN, 100
LUN, 101
LUN, 102
...
ON, 1
ON, 2
...

I used two insert select statements to accomplish this:

select  t.trans_id, td.cd, to_date(to_char(to_date(concat(to_char(ts, 'mm/dd/yyyy '), to_char(td.stop_ts, 'hh24:mi:ss')), 'mm/dd/yyyy hh24:mi:ss', 'yyyymmdd'), 'yyyymmdd')
from trans t, tdef td
where ts1 >= td.start_ts and ts1 <= td.stop_ts

select  t.trans_id, td.cd, to_date(to_char(to_date(concat(to_char(ts, 'mm/dd/yyyy '), to_char(td.stop_ts, 'hh24:mi:ss')), 'mm/dd/yyyy hh24:mi:ss', 'yyyymmdd'), 'yyyymmdd')
from trans t, tdef td
where ts2 >= td.start_ts and ts2 <= td.stop_ts

The third field is the "official date". The way that this works, assume a transaction happened at 12/12/2011 8:01PM, then the ts1 field would be 1/1/1900 8:01PM and the ts2 field would be 1/2/1900 8:01PM. In the first query, this field would join to the cd '24H' and 'ON'. And the official date would calcuate as 12/13/2011 for '24H' and 12/13/2011 for 'ON'. This transaction would not join on the second query becuase it is outside the date range. Assume a transaction happened at 12/13/2011 12:05PM. On the first query, ts1 would join like so: '24H' for the date 12/13/2011, 'LUN' for the date 12/13/2011.

Once I have this table, it is easy to aggregate:

select tdef_trans.dt, sum(sales) from trans, tdef_trans where trans.id = tdef_trans.id and tdef_trans.cd = 'LUN'

Although this solution appears to be working, I am betting there is a more elegant way to do this. Any ideas?

Upvotes: 2

Views: 830

Answers (2)

Adam Musch
Adam Musch

Reputation: 13583

Adding an I/O for every record in the transaction table to map the second of the transaction to the business period seems like a steep price to pay. Perhaps you could instead store and pivot the data, like the query below:

select case 
         when txn_ts - trunc(txn_ts) > numtodsinterval(20, 'hour')
           then trunc(txn_ts) + 1 
           else trunc(txn_ts)     
       end as business_day,
       sum (case when (   txn_ts - trunc(txn_ts) > numtodsinterval(20, 'hour')
                       or txn_ts - trunc(txn_ts) < numtodsinterval(5.5, 'hour')
                 then txn_amt else 0 end) as overnight_sales,
       sum (case when (   txn_ts - trunc(txn_ts) >= numtodsinterval(5.5, 'hour')
                      and txn_ts - trunc(txn_ts) <  numtodsinterval(11, 'hour')
                 then txn_amt else 0 end) as breakfast_sales,
       sum (case when (   txn_ts - trunc(txn_ts) >= numtodsinterval(11, 'hour')
                      and txn_ts - trunc(txn_ts) <  numtodsinterval(4, 'hour')
                 then txn_amt else 0 end) as lunch_sales,
       sum (case when (   txn_ts - trunc(txn_ts) >= numtodsinterval(11, 'hour')
                      and txn_ts - trunc(txn_ts) <  numtodsinterval(4, 'hour')
                 then txn_amt else 0 end) as dinner_sales
  from txn_table
 group by case when txn_ts - trunc(txn_ts) > numtodsinterval(20, 'hour')
             then trunc(txn_ts) + 1 
             else trunc(txn_ts)     
          end 

So for every business day, you've got four values, one for each segment of the business day. (I put in guesses as to the breakfast/lunch and lunch/dinner breakpoints.) Building aggregations off of this table should be pretty easy.

See Creating Histograms with User-Defined Buckets in the Oracle Data Warehousing Guide for other examples, including a non-pivoted version.

Upvotes: 1

rejj
rejj

Reputation: 1216

If you are trying to do data warehousing (it sounds like it), then you may find it easiest to make a table that has every second of the day in it, and which period it belongs to. That will only be 86400 rows.

Then your query becomes a relatively simple join to this time dimension

Upvotes: 2

Related Questions