I'll-Be-Back
I'll-Be-Back

Reputation: 10828

Should order_products table be denormalized?

The order_products table holds data of products with the product name and price. It has a list of records what customers have bought.

There are also two fields called product_name and price which are duplicate data from the products table.

It is worth it to normalize order_products table and create history (audit) table for product name and price? Then I don't need product_name and price in the order_products table anymore?

Upvotes: 0

Views: 406

Answers (4)

Erwin Brandstetter
Erwin Brandstetter

Reputation: 657192

I assume you need to store product name and price at the time of the order. Both will change in the course of time. If that happens a lot, your current approach may be good enough.

I would consider a normalized approach, especially if you have many rows in order_products per (product name, price). Have an additional table that stores the volatile states of a product every time they change. Could be called product_history like you already hinted. Just save the date (or timestamp) with every new state. Have a foriegn key link to the table product to preserve referential integrity. Like this:

create table product_history
(product_id    integer  -- or timestamp
,valid_from    date
,product_name  varchar
,price         decimal
,PRIMARY KEY (product_id, valid_from)
,FOREIGN KEY (product_id) REFERENCES product(product_id)
               ON DELETE CASCADE
               ON UPDATE CASCADE)

A fast query to look up the applicable volatile attributes:

SELECT *
FROM   product_history
WHERE  product_id = $my_product_id
AND    valid_from <= $my_date
ORDER  BY valid_from DESC
LIMIT  1;

You definitely need an index on (product_id, valid_from) to speed up this query. The primary key in my example will probably do.

Upvotes: 1

shikhar
shikhar

Reputation: 2469

Yes that's a good idea, but a better idea is to create one field in order_products table and dump all your order info there after serializing them. With this approach you don't have to create 2 new tables (may be more if you want to do the same for gift coupon info, shipping info etc etc)

Rationale behind the approach is that order_products are placed order which means they are "published records". Published records don't change much and shouldn't be modified. And these records should be kept for future audits.

Upvotes: 0

Alex
Alex

Reputation: 2011

It's not possible to make this judgement knowing just the database structure. It depends on how you use your database (ie. inserts, selects, updates and deletes... And how frequently?).

In one end, if your solution was a reporting solution on a read-only database, you should keep those duplicates! But if on the other end your solution is a logging solution that only logs information but never retreives, I'd go for the denormalized model you're suggesting.

Fully normalized database are not optimized for performance. You often have to denormalize your database design..

Very often a model that has a certain degree of redundant data is the fastest one. When denormalizing you just have to keep a steady eye on the balance between faster queries and slower insertions/updates!

Check these answers and maybe you'll find further help making your decision! When to Denormalize a Database Design

Upvotes: 0

apokryfos
apokryfos

Reputation: 40681

That depends. What is the purpose of that table?

In general tables like that can be used to statistical analysis of market trends so its important to have both product_name and price because the product price today may be different than what it was one month ago, but you may want to know at which prices products were most bought.

However if the presence of the price in that table is due to the fact that the price may be part of the products primary key then that is just bad practice and the key should be reduced.

Upvotes: 0

Related Questions