fancyPants
fancyPants

Reputation: 51888

General understanding of star schema design

So, I think I understood what to put in dimensions, what in the fact table, and how to achieve this. Now I got the problem, that I have this dimension 'product' and a dimension 'productProperties'. I had to split this, cause otherwise my natural key in 'product' wouldn`t have been unique anymore. I asked this in this question.

So my 'productProperties' dimension table should have looked like this: Color | Material | Size

1.)To achieve this, I had to create every possible permutation of the values of 'color', 'material', 'size' and so on, right?

This would have been far over 200 million rows, so I decided to split this up. I have a dimension 'Color' now, which consists actually of the columns 'color', 'colorFront', 'colorBack'.

2.) That`s fine I guess, but what about the dimension 'size', which consists only of the columns 'surrogate_key' and 'value'?

I`ve read about 'degenerate dimensions' (in the reading recommendation given in my other question), which means, to make 'single column dimensions' one column in the fact table. That seems a bit impractical to me, since I would end up with about 5-6 extra columns in my fact table.

What if I should do this?

3.) Are those degenerate dimensions part of the primary key in the fact table?

Most important question: I will have entries in my fact table with products, that don`t match every column in my dimensions or not all dimensions at all. Meaning, I may have an entry / product, which has the property 'color', but not 'colorFront' or 'colorBack'. Since I created every permutation of 'color', 'colorFront' and 'colorBack', when trying to populate my fact table I will get multiple surrogate keys, if the product only has the property 'color' which results in kind of duplicates in my fact table, right?

4.) So do I have to filter those duplicates out, when querying my fact table? Or is this wrong at all?

I could of course split dimension 'color' in three dimensions. But then I will get entries with NULL values in some columns. Same thing with entries / products that don`t use some dimensions at all.

5.) How to handle those NULL values?

Thanks in advance for any help.

Upvotes: 1

Views: 3003

Answers (2)

Stephanie Page
Stephanie Page

Reputation: 3893

1) Whoever told you:

So my 'productProperties' dimension table should have looked like this: Color | Material | Size

was either wrong or you misunderstood.

That idea is called a "Junk Dimension". And it doesn't have to contain the Cartesian product to begin with. It can be loaded like any other dimension. If a combination is needed in the fact table and not in the dimension, you add it then. Cartesian-ing it to begin with is a convenience but then you better know that when a new color is added that you have to re-Cartesian. Better to load when needed and not worry about it.

2) OK so now I've read your whole question and realize that you're reading about Dimensional modeling but it looks like you're skimming it.

A degenerate dimension is something like Purchase Order number. It's not a fact. You can't SUM it. But it's not a Dimension either since there's nothing more than needs to be said about PO123210413. It's NOT a FK to anywhere.

Upvotes: 3

Damir Sudarevic
Damir Sudarevic

Reputation: 22187

Every dimension has:

  • primary key (DateKey, TimeKey, ProductKey, ...)
  • business key (FullDate, ProductFullName, ColorNaturalKey, ...)
  • row with value 'unknown' (Key = 0, BusinessKey = 'unknown', all other = 'n/a')
  • row with value 'n/a' (Key = -1, BusinessKey ='n/a', all other = 'n/a')

In the Color table, columns Color, ColorFront and ColorBack all have values of 'n/a' and 'unknown' -- so those should be included in permutations. This way there is always a row in the dimension table to point to.

You may choose to make the size a degenerate dimension by moving the SizeValue into the fact table and dropping the dimSize.

alt text

Upvotes: 3

Related Questions