Reputation: 103
There are two tables - users and orders:
id | first_name | orders_amount_total |
---|---|---|
1 | Jone | 5634200 |
2 | Mike | 3982830 |
id | user_id | order_amount |
---|---|---|
1 | 1 | 200 |
2 | 1 | 150 |
3 | 2 | 70 |
4 | 1 | 320 |
5 | 2 | 20 |
6 | 2 | 10 |
7 | 2 | 85 |
8 | 1 | 25 |
The tables are linked by user id. The task is to show for each user the sum of all his orders, there can be thousands of them (orders), maybe tens of thousands, while there can be hundreds and thousands of users simultaneously making a request. There are two options:
Which option is better to use? Why? Why is the other option bad?
P.S. I believe that the second option is concise and correct, given that the database is relational, but there are strong doubts about the load on the server, because the sample when calculating the amount is large even for one user, and there are many of them.
Upvotes: 0
Views: 87
Reputation: 2806
I agree with @sticky_bit that Option 2. is better than 1. There's another possibility:
Create a VIEW
that's a pre-defined invocation of the JOIN
/SUM
query. A smart DBMS should be able to infer that each time the orders
table is updated, it also needs to adjust orders_amount_total
for the user_id
.
BTW re your schema design: don't name columns id
; don't use the same column name in two different tables except if they mean the same thing.
Upvotes: 1
Reputation: 37482
Option 2. is the correct one for the vast majority of cases.
Option 1. would cause data redundancy that may lead to inconsistencies. With option 2. you're on the safe side to always get the right values.
Yes, denormalizing tables can improve performance. But that's a last resort and great care needs to be taken. "tens of thousands" of rows isn't a particular large set for an RDMBS. They are built to handle even millions and more pretty well. So you seem to be far away from the last resort and should go with option 1. and proper indexes.
Upvotes: 1