galford13x
galford13x

Reputation: 2531

Database Design Question regarding duplicate information

I have a database that contains a history of product sales. For example the following table

CREATE TABLE SalesHistoryTable (
OrderID, // Order Number Unique to all orders
ProductID, // Product ID can be used as a Key to look up product info in another table
Price, // Price of the product per unit at the time of the order
Quantity, // quantity of the product for the order
Total, // total cost of the order for the product. (Price * Quantity)
Date, // Date of the order
StoreID, // The store that created the Order
PRIMARY KEY(OrderID)); 

The table will eventually have millions of transactions. From this, profiles can be created for products in different geographical regions (based on the StoreID). Creating these profiles can be very time consuming as a database query. For example.

SELECT ProductID, StoreID,
 SUM(Total) AS Total,
SUM(Quantity) QTY,
SUM(Total)/SUM(Quantity) AS AvgPrice
FROM SalesHistoryTable
GROUP BY ProductID, StoreID;

The above query could be used to get the Information based on products for any particular store. You could then determine which store has sold the most, has made the most money, and on average sells for the most/least. This would be very costly to use as a normal query run anytime. What are some design descisions in order to allow these types of queries to run faster assuming storage size isn’t an issue. For example, I could create another Table with duplicate information. Store ID (Key), Product ID, TotalCost, QTY, AvgPrice And provide a trigger so that when a new order is received, the entry for that store is updated in a new table. The cost for the update is almost nothing.

What should be considered when given the above scenario?

Upvotes: 1

Views: 185

Answers (4)

Eric Petroelje
Eric Petroelje

Reputation: 60498

This is normally something you would use a data warehouse for, but aside from that, using a trigger to update a second table is a perfectly viable option.

You could also have a second table that is populated by a batch job on a periodic basis (a more data-warehouse like option). You could also use a materialized view if your database supports them.

Upvotes: 2

Erwin Smout
Erwin Smout

Reputation: 18408

"The cost for the update is almost nothing."

Except that all updates must now be serialized. Because no matter what, the ancient law of physics still remains that no two things can be in the same place at the same time.

Upvotes: 0

newdayrising
newdayrising

Reputation: 3802

You may want to look into using materialized views, which will only be queried periodically.

Upvotes: 1

gbn
gbn

Reputation: 432261

I'd consider:

  • a data warehouse/OLAP solution
  • (as you said) run your data mining queries against a separate precomputed table/dataset
  • indexed/materialised views which is almost the same as the previous point

There are some questions though:

  • do you expect real time data?
  • what is your write volume?
  • what DB engine?

Upvotes: 1

Related Questions