How to deal with equally growing fact/dimension tables in datawarehouse design?

Question

I have a source data set with:
1. customer
2. customer_product_purchase
3. customer_support_plan_purchase
4. customer_support_request

All of them have a relationship such that a support request is raised against a plan and product purchase. And that a customer buys a support plan for a product (which the customer also buys).

In order to design a data warehouse schema for this, I was thinking of creating a single fact table, I thought of the following approaches:

A. Having a consolidated fact table with customer_product_purchase, customer_support_plan_purchase and customer_support_request into one as they have a few common attributes (and few uncommon ones which can be kept blank for others). As I believe they are at the same granularity (purchase of product/support plan, raising a request against a support plan). This would mean losing some specific information as to make it generic, like product warranty and support plan validity under the same name validity

B. Creating a fact table from customer_product_purchase and customer_support_plan_purchase which are inherently purchases, and can be kept together with some common and some uncommon attributes. The customer_support_request can be treated as separately.

C. Creating a fact table around customer_support_request as it ties up to both the other tables, which can be the dimensions. However, this will mean that the dimensions will also grow at the same rate as the fact (which I have read, is an indicator of bad design).

So how can I deal with such a situation where the support plan, service request and product purchase can grow by themselves individually, is it best to just keep all of them separate? But because they (all or two of them) have similar granularity, shouldn't they be consolidated?

Nick.Mc · Accepted Answer

Some points from my comments

Fact tables in star schemas should model business processes.
I suggest not trying too hard to combine facts unless it makes clear sense to do so. Facts at different detail levels are a strong indicator to not combine

Here some observations about the facts detail levels;

You can have 0 or many support plans against a single purchase. This is a different level of detail and likely a different fact
You can raise zero or many support requests against a single support plan. This is a different level of detail and likely a different fact
If you put support plan and sale in the same fact and you have two product sales for the same customer, one with zero support plans and one with three, you end up with five (2+3) records in your fact, and it's difficult to associate the support plans with the sale because they are in different rows. For example if you want a value ratio of support plan to purchase amount, for a given product group, while not impossible, it's just doesn't "smell" right to have all of these scattered through the same fact
If you have very few identified repeat customers then yes your customer dim will grow at the same rate as the fact but this is unavoidable and normal

Keep in mind there is never any "best" solution so don't get stuck in analysis paralysis. It's worth modelling in quickly in something like Power BI and throwing some business questions at it. Remember your star schema is about making business questions easier to answer.

How to deal with equally growing fact/dimension tables in datawarehouse design?

Answers (1)

Related Questions