user5900426
user5900426

Reputation:

Modeling many-to-many relationship in data warehouse

I have to design data warehouse model and ETL process for class at my University. My data warehouse has to store opinions / comments about a product, each record should consist of:

In addition data warehouse should store information about product:

I want to create data warehouse model first, but I have problem with storing product pros and cons as it is many-to-many relationship. In normal relational database I would simply create associative table, but here I am not sure how to proceed, after all I don’t want to normalize facts table.

I am considering 3 approaches, first, which I presented in diagram below. I used bridge table method (though, I don’t know if correctly) to get rid of many-to-many relationship. I don’t know how it will impact querying performance.

ER Model

Second approach I may use is boolean column method. In PROS and CONS table I can create a column for each possible value, but there can be up to 100 different pros or cons. Also number of possible pros or cons is not constant in time. Authors in their comments can list new pros or cons (that’s how it works in data source), but I can’t add new columns (I shouldn’t change data in data warehouse).

Third approach I am considering, is to keep pros in PROS table but in 1 column, where values will be separated using commas or some other delimiter e.g. “price, design, color”. It keeps things simple but hard to analyze or slice & dice.

Which approach should I use in this situation? Which is better for loading data into data warehouse, because form data source I will get all the comments and I want to only load comments that are new since last loading?

Upvotes: 2

Views: 2144

Answers (1)

Lahiru Fernando
Lahiru Fernando

Reputation: 104

What I think is, if we can get your first option little bit modified to than what you have said here, it would be the best as I understand.

in your image you have provided, having the Pros_Bridge_Detail table is fine. The rest need to be changed.

  • you can remove the pros_Bridge table that holds just the count. you can actually add that column to your COMMENT fact table you have up there. That would be more efficient and easy when it comes to queries rather than querying in many tables.
  • you said you have many areas to give pros like price, design, durability etc. Lets put those stuff into a separate dimension.
  • Add a new column to your Pros_Bridge_Detail table to hold the ID of the newly created Dimension that holds the product pro types (Design, durability etc). Now, once you add a product Pro, the Pros_Bridge_Detail table will have the pros the user give and also hold the value of regarding what the pro is given via the ID of the new dimension.
  • Also don't forget to store the Comment ID as well in Pros_Bridge_Detail table as that will be your link (FK) to Comments fact table you have.

Same can be done to Cons as well.

Hope you understand what I just explained and hope it helps. let know if you have any issues.

Upvotes: 0

Related Questions