Reputation: 2616
I'm working rearchitecting a reporting/data warehouse type database. We currently have a table that has data at the hotel grain (i.e. HotelID plus lots of measures, including measures like Last7DaysGross, Last28DaysXXX, etc).
I'm thinking that it would be best to move to a fact table that is at the Hotel/StayDate grain. However, grouping on the HotelID and including date related measures such as Last7DaysGross need to perform very well.
What kind of structures would work here? I don't think I'd be able to use indexed views the way that I had hoped, because of the multiple restrictions on them (no subqueries, etc.) To have reasonable performance, will I need to create a new table at the Hotel level (aggregated from the HotelStayDate level?) That's the level at which people will most often be querying. Do I need to actually create fields such as Last7DaysGross? That doesn't seem like a good design, but I'm having a hard time coming up with another one.
Sorry this question is a little vague. Is there something else I'm missing here? I know most often these kind date related measures would be done at the front-end level (i.e. in a tool such as Business Objects). However, for this project, we'll need to have it in the database.
thanks, Sylvia
EDIT:
Thanks for all the thoughtful comments! I accepted David Marwick answer because of his idea of having an expanded date dimension. That thought hadn't even crossed my mind, and it sounds well worth trying.
Expanding a little on David Marwick's thoughts, I came up with this idea. I might try and see how it actually works:
DateDimension
DateKey
DateKeyBeginLast28Days
DateKeyEndLast28Days
Fact
DateKey
GrossTransactions
Then when querying:
Select
DateKey
,SumLast28Day = sum(GrossTransaction)
from Fact
join DateDimension
on Fact.DateKey >= DateDimension.DateKeyBeginLast28Days
and Fact.DateKey <= DateDimension.DateKeyEndLast28Days
group by DateKey
Upvotes: 1
Views: 1244
Reputation: 111
I think your design of having one table at the [Hotel, Date] grain then rolling up into Hotel sounds fine. As Damir points out it keeps your read queries simple and makes it easy to add/remove aggregate measures going forward (keeping in mind that it's generally a bad idea to design around requirements that you may have in the future).
Pondlife makes good points as well. Your qualitative requirements might dictate how feasible it is to maintain an aggregate table, for example how often the system needs to update (daily, hourly, 15 mins, realtime?), how accurate the measures need to be (maybe the users just need a rough sense of how well each hotel is doing), how costly it is to read the source transaction data, how available the source transaction data is in the long-term (does it get archived), etc.
If you choose to add a [Hotel, StayDate] grain fact table and not maintain an aggregate then perhaps you can explore some tricks in your dimensions to save time. Maybe something like a 7-day date dimension containing [date, date_in_last_7_days] (so 7 records for each date) in case a straight join vs range querying the past 7 days saves you any time. That might be a stupid example but something along those lines. Date dimensions are small.
Lastly consider hardware optimizations like moving tables into memory (especially dimensions or non-gigantic fact tables) if you need to improve performance.
Upvotes: 1
Reputation: 22187
Well depends, a usual query (for last 7 days) would look something like
select
HotelName
, sum(SaleAmount) as Sales
from factSale as s
join dimDate as d on d.DateKey = s.DateKey
join dimHotel as h on h.HotelKey = s.HotelKey
where DaysAgo between 1 and 7
group by HotelName
;
However, suppose you have a report which includes running sums (over periods) and their changes. The report layout may look something like this:
| Date | 1-Day | Change-1-Day % | 7-Day | Change-7-Day % | 28-Day | Change-28-Day | 90-Day | Change-90-day % |
Not so simple any more. So, it is much easier to crate an aggregated (fact) table with pre-calculated fields for standard periods and run a query over that one.
So the aggregate (fact) table may look something like
factRunningSum
----------------------------
DateKey integer (PK)
HotelKey integer (PK)
Sale_1_Day decimal(19,2)
Sale_7_Day decimal(19,2)
Sale_28_Day decimal(19,2)
Sale_90_Day decimal(19,2)
Upvotes: 0
Reputation: 16250
As David says, there is nothing at all wrong with pre-aggregating some totals (during the ETL process) in order to make important queries run faster. This is a common technique even in operational databases and it makes a lot of sense in data warehouses if you know that certain aggregates are frequently used.
So you could certainly create a table called FactHotelRevenueSummary
(or whatever matches your existing naming convention) with HotelID, Last7DaysGross, Last28DaysGross
and as many other aggregates as you like.
The main points to consider first in my opinion are:
Upvotes: 1
Reputation: 666
Aggregate fact tables are entirely acceptable within Data Warehouses.
If you have not allready I would recommend the below book
In here Kimball states that the pre-aggregation of Fact tables into Aggregate fact tables is fine, however it does state they should be similar to the "base" fact table at a rolled up level.
The introduction of reporting fields I suspect should be in your front end Report tool or Cube viewer.
Upvotes: 1