In Data Warehousing, Should a Data Model have less+large Dimension Tables or split into more+small Dimension Tables?

I've just started learning Data Modeling in DW. And now I'm a bit confused about choosing with field to build a Dimension Table and whether should I split into many DIM tables For example, I have a Employee Table and it's columns as below:

    EmployeeID
    FirstName
    LastName
    State
    Age
    Department
    Position
    Skin Color
    Hair Color
    Sale Amount (Fact)

Emp_DIM table content every column above except Sale Amount

and a Fact table that contains

EmployeeID (Surrogate Key)
Sale Amount (Fact)

Ok, put the Date DIM aside. I want to ask about the Emp_DIM table

  1. Should I add Department_DIM(ID, Name, HeadPerson), State_DIM (ID, Name, StateCode), Position_DIM (ID, Name, Description), and Color_DIM (ID, Name) to represent Department, State, Position, Color? Then the Fact_Table will also have DepartmentID, StateID,... in it. Because sometimes I saw Department_DIM, Position_DIM and State_DIM table on the internet
  2. How can I know which attribute should be a separate DIM_table? Because in the above example, I can still query everything like Total Sale Amount per Department, per State, per Position,... with SQL and join with 1 DIM table only.

According to what I read on the internet, the DIM table should have as many fields as possible, and contain descriptive information. But I'm still confused and not sure.

Upvotes: 1

Views: 829

Answers (2)

BayouKid
BayouKid

Reputation: 79

The employee dimension and sales fact table that you showed here are intuitive and logical; that's a good start. On the surface I don't see any reason to split those up, as you're just going to create more surrogate keys to manage.

I would only split your dimension if there's a need for a fact to reference those entities separately. Here's an example of why this might be necessary.

For example: Jane Smith (an employee) sold product X at location B because she was covering for someone else that day, but she is normally assigned to location A. Ergo, her employee dimension record would show that she is normally associated with location A, but the fact record for that particular sale would show it as having occurred at location B.

That's a good example of why you'd have a separate location dimension ... but you'll note that I'm still including the employee's assigned location in the employee dimension.

But if there's not a good reason compelling you to split your dimension, then I wouldn't. It's just going to make your schema more complex, your ETL's (or ELT's) more complex, etc.

Upvotes: 2

David Browne - Microsoft
David Browne - Microsoft

Reputation: 89361

Should I add Department_DIM

If a Department has other attributes you need to report on, then you should consider splitting it out. This "normalization of dimensions" is also called "snowflaking", or moving from a star schema to a snowflake schema. But it's totally optional and people do it both ways.

Then the Fact_Table will also have DepartmentID

This a separate decision. This is denormalizing by putting non-key dimension attributes directly on your fact table, and it's a performance optimization. You might do this for a few key attributes.

Upvotes: 0

Related Questions