Reputation: 7325

How to properly design a sql database to use an aggregate value?

In the following example

Order
-------
ID (int)
CreatedAt (smalldatetime)
....

OrderItems
-------
ID (int)
OrderID (int)
Quantity (int)
UnitPrice (decimal)
CalculationUnit (int)
TotalItemPrice (decimal)
....

I have a big dillema where and how should i keep track of TotalOrderPrice and I my main concern is speed and data consistency.

a) TotalOrderPrice could be stored in table Orders and should be updated on each OrderItem change for relevant OrderID

Could this lead to data inconsistency since data is "duplicated"?

b) I could have a view that could hold summed TotalItemPriceValues such as

OrderTotal
------------
ID (int)
TotalOrderprice (decimal)

Could this be a potencial issue when scaling application?

c) Or I could leave the original design as it is, and calculate OrderTotalPrice inside a business logic.

Could this slow down performance since all order items should have to be retrieved in order to get the total order price?

I know there are no silver bullets, but since I dont have large amount of data for testing, I just want to do facts check and see what would be a proper reasoning to find the solution here?

Upvotes: 2

Answers (5)

Jo Douglass

Reputation: 2085

In general, my take is that you should avoid breaking the rules of normalisation until you need to. That means avoiding data redundancy in order to avoid update anomalies, and calculating things on the fly. I've seen a lot of terrible databases created because a developer worried that one day the database might not cope with the application load; in truth, in a well-designed, well-indexed, and well-maintained database this is rare. RDBMSes are a very good tool for dealing with large amounts of normalised data in transactional systems, if your database is designed and maintained correctly.

This doesn't mean you need to do the calculations in your application logic, though - and in fact I'd avoid that. Instead, make a view (looking like the query Tim Biegeleisen suggested in his answer) that does the calculations. If sometime down the road you find that this doesn't scale well, you can change the table and the view, plus whatever is populating this table - this minimises the disruption to your application if this change is needed. If the table is populated via a stored procedure then you might not need to make any changes to your front end application logic at all in order to switch from calculating on the fly to pre-calculated.

Upvotes: 1

Thorsten Kettner

Reputation: 94914

Here is my second answer, which is very different from my first one...

Usually you avoid redundancy in databases, because it can lead to inconsistencies. What would you do for instance, if you saw some day that an order's stored TotalOrderPrice doesn't match what you calculate from the positions? To avoid such inconsistencies, we avoid redundancies.

In a datawarehouse, however, you invite redundancy in order to have faster access to the data. That means you could have an order system containing the pure Order and OrderItems tables and have a datawarehouse system that gets updated in intervals and has an Order table with a column for the TotalOrderPrice.

Thinking this further ... Does an order ever change in your system? If not than why not store what you print, i.e. store the TotalOrderPrice redundantly. (You can use some database mechanisms to prevent orders from being partially deleted or updated to get this even safer.) If later the TotalOrderPrice really doesn't match what you calculate from the positions, then this even indicates a problem with your software at the time you wrote the order. So having stored the TotalOrderPrice is an advantage suddenly probably giving us the chancce to detect such errors and do corrections in our accounting.

Having said this: Usually an order gets written and not changed afterwards. As no changes are going to apply, you can easily store the TotalOrderPrice in the orders table and have both the advantages of seeing later what order price you sent/printed and retrieving the prices faster.

Upvotes: 1

David Aldridge

Reputation: 52346

I would consider what the access patterns are for the data, as that is what determines the relevant pros and cons.

How often will you need to:

Place a predicate on the total order amount (compute intensive unless you store the total).
Order by the total order amount (compute intensive unless you store the total).
Modify the total order amount (compute intensive and possibly a cause of error if you store the total).

If orders are never modified after creation and you frequently place predicates on the total or order by it, then I'd be confident about storing the total in the order table.

If orders are frequently modified but you very rarely need to place predicates on the total or order by it, then I would be confident in not storing the total.

The correct approach for you depends strongly on where the balance lies between those two extremes, and the risk you're willing to adopt in either poor performance or incorrect data.

Upvotes: 1

Thorsten Kettner

Reputation: 94914

An order won't contain millions of positions, so speed shouldn't be a problem you must worry about.

Your table OrderItems contains Quantity and UnitPrice and TotalItemPrice. This already seems redundant. Isn't TotalItemPrice always Quantity x UnitPrice? Provided the UnitPrice is already the gross price to pay (and not the net price where VAT must be added to get the TotalItemPrice). And provided any item discount is already included. If there'd be another column item_discount_percent for instance we might get a result with too many digits, e.g. 105.987002. Does the order then contain 105.98 or 105.99 in this example? We may want to store that value in TotalItemPrice then to make this clear. (And to make sure a new software version would still print the exact same order.) So have this column only if some calculation may lead to prices with more than two decimal places.

As to your question and a TotalOrderPrice we can apply the same thinking: If the price is just the sum of all the order's TotalItemPrice then don't store it. If there is some calculation to be done leading to too many decimal places (e.g. an order_discount_percent) you should probably store that (rounded/truncated) value.

Upvotes: 1

Tim Biegeleisen

Reputation: 521249

I would recommend against maintaining a computed column, which needs to frequently be updated, and instead to compute the order total in a query on demand, when your application needs it. You can use a query like the following which should run reasonably fast:

SELECT t1.ID, t2.OrderTotalPrice
FROM Order t1
INNER JOIN
(
    SELECT OrderID, SUM(TotalItemPrice) AS OrderTotalPrice
    FROM OrderItems
    GROUP BY OrderID
) t2
    ON t1.ID = t2.OrderID

This avoids the problem of having to maintain a computed column, which makes managing your database much easier. A strong argument against a computed column is that it doesn't really save the database any work. Rather, it always needs to be maintained, whereas computing a column on demand only needs to be done when you actually need it.

Upvotes: 2

How to properly design a sql database to use an aggregate value?

Answers (5)

Related Questions