user3644997
user3644997

Reputation: 99

Star Schema Structure - To many Dimensions

I have a star schema warehouse (MS SQL Server, accessed via MS Report Builder with OLAP) which has a lot of tiny dimensions - by this I mean the dimensions are built from two columns (Id and Description) with several hundred linked from the Fact tables.

This provides the option of presenting all items off the Fact even when there is not an actual count against this return (show nulls), however I am not convinced that this represents the data in the best possible way - I would rather see a small number of denormalised tables where the description was part of the Fact as this would provide better ability to query the data via SQL alongside the OLAP approach.

Is this structure of lots of one level dimensions normal and good practice? To be honest the only time I would expect to show blanks are against something such as a time or date dimension, however as these can be coerced from the data to give you the gaps in charts and tables it does not really seem to matter that much.

Any views on if this structure is good or bad - I would like to try and get this changed but if I am out of step with the best practice I would happily change my mindset.

Example of the structure (this is just part of one Fact table)

Fact table - (Property)

F_PROPERTY.PROPERTY_ID (Key for table)
F_PROPERTY.CYCLE_FRAME_TYPE_ID
F_PROPERTY.CYCLE_GEARS_NUMBER_ID
F_PROPERTY.CYCLE_GEARS_TYPE_ID
F_PROPERTY.CYCLE_GENDER_ID
F_PROPERTY.CYCLE_MUD_GUARDS_ID
F_PROPERTY.CYCLE_MUD_GUARDS_COLOUR_ID

Dimension tables -

D_CYCLE_FRAME_TYPES.CYCLE_FRAME_TYPE_ID
D_CYCLE_FRAME_TYPES.CYCLE_FRAME_TYPE_DESC

D_CYCLE_GEAR_TYPES.CYCLE_GEAR_TYPE_ID
D_CYCLE_GEAR_TYPES.CYCLE_GEAR_TYPE_DESC

D_CYCLE_GEAR_TYPES.CYCLE_GEARS_NUMBER_ID
D_CYCLE_GEAR_TYPES.CYCLE_GEARS_NUMBER_DESC

D_CYCLE_GEAR_TYPES.CYCLE_GENDERS_ID
D_CYCLE_GEAR_TYPES.CYCLE_GENDERS_DESC

D_CYCLE_GEAR_TYPES.CYCLE_MUD_GUARDS_ID
D_CYCLE_GEAR_TYPES.CYCLE_MUD_GUARDS_DESC

So rephrasing this - should the dimensions really be separate tables of the fact or would they be better with the description as part of the Fact? I want reporting to be quick and simple and with minimal dropping of records where there is no values in fields.

Upvotes: 2

Views: 2355

Answers (1)

Wes H
Wes H

Reputation: 4439

Do not put the description in the fact table. The purpose of the fact is to measure events. Dimensions show the possible attributes of an event, even if the event hasn't occurred. A restaurant menu would be a dimension, the food ordered by the customer is the fact event.

It looks like you may need to denormalize your dimensions. For example, if your cycle gear has type, number & manufacturer, make it a single cycle gear dimension with one ID and three description attributes.

You should also consider junk dimensions. These are made from multiple, unrelated, single attribute dimensions, combined to use one ID in the fact. The number of records is a Cartesian product of all the possible column attributes, but you can reduce that some by eliminating unrealistic combinations. For example, gender, ethnicity and education would be good candidates for a single junk dimension. They are unrelated, but have few values so the Cartesian product is reasonable.

The Star Schema enables very high performance reporting queries by filtering on the smaller, unique dimension attributes and then joining to the fact events. Muddying your fact tables will reduce the overall performance.

Upvotes: 3

Related Questions