Sara
Sara

Reputation: 31

Star Schema Query - do you have to include all dimensions(joins) in the query?

Hi I have a question regarding star schema query in MS SQL datawarehouse.

I have a fact table and 8 dimensions. And I am confused, to get the metrics from Fact, do we have to join all dimensions with Fact, even though I am not getting data from them? Is this required for the right metrics?

My fact table is huge, so that's why I am wondering for performance purposes and the right way to query.

Thanks!

Upvotes: 2

Views: 1440

Answers (2)

Walid LEZZAR
Walid LEZZAR

Reputation: 35

It is not necessary to include all the dimensions. Indeed, while exploring fact tables, It is very important to have the possibility to select only some dimensions to join and drop the others. The performance issues must not be an excuse to give up on this capability.

You have a bunch of different techniques to solve performance issues depending on the database you are using. Some common ways :

  • aggregate tables : it is one of the best way to solve performance issues. If you have a huge fact table, you can create an aggregate version of it, using only the most frequently queried columns. This way, it should be much smaller. Then, users (or the adhoc query application) has to use the aggregrate table instead of the original fact table when this is possible. The good news is that most databases know how to automatically manage aggregate tables (materialized views for example). Queries that initially target the original fact table, are transparently redirected to the aggregate table whenever possible.
  • indexing : bitmap indexing for example can be an efficient way to increase performance in a star schema fact table.

Upvotes: 0

Twan
Twan

Reputation: 101

No you do not have to join all 8 dimensions. You only need to join the dimensions that contain data you need for analyzing the metrics in the fact table. Also to increase performance make sure to only include columns from the dimension table that are needed for the analysis. Including all columns from the dimensions you join will decrease performance.

Upvotes: 1

Related Questions