Reputation: 594
This is more of a preference but I was wondering what people think would be the optimal option to perform. I have a Question, Answer, and Point (because I need to track which user made the point)
Table Dump
Question:
id
title
Answer:
id
question_id
user_id
response
Point_Answer:
id
answer_id
user_id
points
So in this layout to get the Top Answer would require a complicated join sequence.
SELECT t2.id, t2.user_id, t2.response, MAX(points)
FROM Question as t1,
(SELECT qa.*, SUM(pa.points) as points
FROM answer as qa, Point_Answer as pa
WHERE qa.id = pa.answer_id
GROUP BY qa.id) as t2
WHERE t1.id = %s AND t1.id = t2.question_id
Where if I changed it like this:
Question:
id
title
Answer:
id
question_id
user_id
response
points
Point_Answer:
id
answer_id
user_id
points
The query would be less burdening
SELECT A.id, A.user_id, A.response, MAX(points)
FROM Question as Q, Answer as A
WHERE Q.id = %s AND Q.id = A.question_id
GROUP BY A.id
Also would mean I would have to make sure when Point_Answer is added Answer.points get added. So basically an extra UPDATE. Basically it is "Integrity vs. Redundancy" and a bit of optimization, what would the better way to go be?
Upvotes: 2
Views: 348
Reputation: 1124
It depends on many factors, most of which depend on your setup.
The two most important factors are:
So basically, I would stick to the first solution. Not having a normalized relation scheme is a good thing sometimes, but you should one denormalize your structure, if you are sure, it will give you a performance boost and if you've identified the bottleneck in your application in a production-like environment.
Upvotes: 2
Reputation: 32690
If the query performs reasonably well, I would leave it as is. An ugly, well-performing query beats redundancy in my book.
With the redundancy option, you need to make sure you encapsulate your update statements in a transaction to make sure everything gets updated; otherwise, you run the risk of having your data out of sync.
I've worked with some legacy apps that went the Redundancy route without transactions, and when one table doesn't get updated for whatever reason, it gets messy.
Upvotes: 1
Reputation: 96572
It would depend on how slow the first is not the complexity of the join. It would be an extremely poor idea to do this solely becasue you don't want to write (one time) a more complex query. Performance is the only real reason to do something of this nature.
If the first is unacceptably slow, then a table or field summing the points can be an acceptable denormalization if and ONLY if you keep the field updated through a trigger not from the application (the only way to ensure accuracy of the denormalized number). You would need to test the solution including the extra update time to determine if you have indeed saved any processing time. This may depend on how often the numbers are changed. FOr instance if you add a second to the update time and save ten seconds on the select, but you 10,000 updates for every selct this is not a good optimization. However if you make a report go from an hour to millseconds and only add a millisecond to the insert or update, it might be acceptable.
There is no way to answer this without actually coding and testing both solutions with production level workload and data.
Upvotes: 5