LearnByReading
LearnByReading

Reputation: 1883

Why do the set of dimension members in a fact table are typically used as a composite key?

It seems like based on my research that the "best practices" suggest that a row in the fact table would have a composite key consisting of ALL dimension measures: For instance, if my measure in the fact table is "sales" and I have four dimensions: "location, salesperson, buyerCat, salesMonth" then, my composite key would consist of unique values for those 4 dimensions. But this would cause serious issues down the road: what if I have two measures with the unique set of dimensions???

  --Example: Fact table row: 
  Sales Amount: $100, location: US, salesperson: Bob, buyerCat: Young, salesMonth: Feb/2010
  Sales Amount: $640, location: US, salesperson: Bob, buyerCat: Young, salesMonth: Feb/2010 

This measure would then be prevented from entering the data mart, just because ALL of the dimension members are being used as the composite key. Am I not correct?

Upvotes: 1

Views: 492

Answers (3)

Ntiyiso Mayile
Ntiyiso Mayile

Reputation: 31

It is best to avoid using a composite key or any business related keys for that matter to unique identify your fact table rows. I can guarantee you that you will find many records that share the same dimension keys. clearly define your fact table grain using steps provided from Kimball website and you will not have any need to worry about the uniqueness of your fact rows

Upvotes: 1

Jo Douglass
Jo Douglass

Reputation: 2085

If you're finding that this would be problematic, then you might want to look at your data model and question why you want to load multiple rows which use exactly the same dimensions.

In the example you've given, presumably those were actually different sales. If so, they perhaps took place on different days - but you're only recording at month level, so you've lost that data. If you include the exact date as a dimension, then your duplicate issue goes away. Or if the two sales could be on the same day, maybe there's a transaction number from the sale which could be recorded and used as degenerate dimension - again, you no longer have rows which use the same dimensions.

Transaction fact tables should model things based on an event - in the case of your example, a sale taking place - and they should reference enough dimensions to uniquely identify each specific incidence of that event.

If you really don't care about holding data down to the finest grain, then what you're building is not a transaction fact table, but perhaps something like a periodic snapshot fact table. In that case, you should sum up those two rows so you just have one row with a Sales Amount of $740.

However, I would be very careful about building a warehouse in this way and not building transaction fact tables which go down to the lowest grain - even if no one wants to report on or analyse down to that level right now, they might want to later on, and it'll be painful to refactor your data warehouse and your ETL to work at a lower grain. Whereas if you create a transaction fact table first at the lowest possible grain, you can always aggregate up - whether your users do that in an OLAP tool like SSAS, or you create some aggregated tables or views to make it easier for reporting purposes.

Upvotes: 3

Tim3880
Tim3880

Reputation: 2583

Unless you define the composite key as unique, you can have as many as duplicates you want.

Upvotes: 0

Related Questions