Reputation: 2616

Data warehouse design - how to design for measures that are time related, such as last 7 days, etc

I'm working rearchitecting a reporting/data warehouse type database. We currently have a table that has data at the hotel grain (i.e. HotelID plus lots of measures, including measures like Last7DaysGross, Last28DaysXXX, etc).

I'm thinking that it would be best to move to a fact table that is at the Hotel/StayDate grain. However, grouping on the HotelID and including date related measures such as Last7DaysGross need to perform very well.

What kind of structures would work here? I don't think I'd be able to use indexed views the way that I had hoped, because of the multiple restrictions on them (no subqueries, etc.) To have reasonable performance, will I need to create a new table at the Hotel level (aggregated from the HotelStayDate level?) That's the level at which people will most often be querying. Do I need to actually create fields such as Last7DaysGross? That doesn't seem like a good design, but I'm having a hard time coming up with another one.

Sorry this question is a little vague. Is there something else I'm missing here? I know most often these kind date related measures would be done at the front-end level (i.e. in a tool such as Business Objects). However, for this project, we'll need to have it in the database.

thanks, Sylvia

EDIT:

Thanks for all the thoughtful comments! I accepted David Marwick answer because of his idea of having an expanded date dimension. That thought hadn't even crossed my mind, and it sounds well worth trying.

Expanding a little on David Marwick's thoughts, I came up with this idea. I might try and see how it actually works:

DateDimension
   DateKey
   DateKeyBeginLast28Days
   DateKeyEndLast28Days

Fact
   DateKey
   GrossTransactions

Then when querying:

Select
   DateKey
   ,SumLast28Day = sum(GrossTransaction)
from Fact
   join DateDimension
      on Fact.DateKey >= DateDimension.DateKeyBeginLast28Days
      and Fact.DateKey <= DateDimension.DateKeyEndLast28Days
group by DateKey

Upvotes: 1

Answers (4)

dmarwick

Reputation: 111

I think your design of having one table at the [Hotel, Date] grain then rolling up into Hotel sounds fine. As Damir points out it keeps your read queries simple and makes it easy to add/remove aggregate measures going forward (keeping in mind that it's generally a bad idea to design around requirements that you may have in the future).

Pondlife makes good points as well. Your qualitative requirements might dictate how feasible it is to maintain an aggregate table, for example how often the system needs to update (daily, hourly, 15 mins, realtime?), how accurate the measures need to be (maybe the users just need a rough sense of how well each hotel is doing), how costly it is to read the source transaction data, how available the source transaction data is in the long-term (does it get archived), etc.

If you choose to add a [Hotel, StayDate] grain fact table and not maintain an aggregate then perhaps you can explore some tricks in your dimensions to save time. Maybe something like a 7-day date dimension containing [date, date_in_last_7_days] (so 7 records for each date) in case a straight join vs range querying the past 7 days saves you any time. That might be a stupid example but something along those lines. Date dimensions are small.

Lastly consider hardware optimizations like moving tables into memory (especially dimensions or non-gigantic fact tables) if you need to improve performance.

Upvotes: 1

Damir Sudarevic

Reputation: 22187

Well depends, a usual query (for last 7 days) would look something like

select
    HotelName
  , sum(SaleAmount) as Sales
from factSale as s
join dimDate  as d on d.DateKey  = s.DateKey
join dimHotel as h on h.HotelKey = s.HotelKey 
where DaysAgo between 1 and 7
group by HotelName 
;

However, suppose you have a report which includes running sums (over periods) and their changes. The report layout may look something like this:

| Date | 1-Day | Change-1-Day % | 7-Day | Change-7-Day % | 28-Day | Change-28-Day | 90-Day | Change-90-day % |

Not so simple any more. So, it is much easier to crate an aggregated (fact) table with pre-calculated fields for standard periods and run a query over that one.

So the aggregate (fact) table may look something like

factRunningSum
----------------------------
DateKey     integer  (PK)
HotelKey    integer  (PK)
Sale_1_Day  decimal(19,2)
Sale_7_Day  decimal(19,2)
Sale_28_Day decimal(19,2)
Sale_90_Day decimal(19,2)

Upvotes: 0

Pondlife

Reputation: 16250

As David says, there is nothing at all wrong with pre-aggregating some totals (during the ETL process) in order to make important queries run faster. This is a common technique even in operational databases and it makes a lot of sense in data warehouses if you know that certain aggregates are frequently used.

So you could certainly create a table called FactHotelRevenueSummary (or whatever matches your existing naming convention) with HotelID, Last7DaysGross, Last28DaysGross and as many other aggregates as you like.

The main points to consider first in my opinion are:

Pre-aggregation is really necessary due to observable performance issues, i.e. you are adding complexity to the database in order to solve a real problem and not because you have a vague feeling that it might be helpful
Your ETL process has checks to ensure that the aggregated data matches the 'raw' data exactly, otherwise you will get different numbers depending on which fact table you query and that has a very negative effect on user confidence

Upvotes: 1

David Adlington

Reputation: 666

Aggregate fact tables are entirely acceptable within Data Warehouses.

If you have not allready I would recommend the below book

DW Toolkit

In here Kimball states that the pre-aggregation of Fact tables into Aggregate fact tables is fine, however it does state they should be similar to the "base" fact table at a rolled up level.

The introduction of reporting fields I suspect should be in your front end Report tool or Cube viewer.

Upvotes: 1

Data warehouse design - how to design for measures that are time related, such as last 7 days, etc

Answers (4)

Related Questions