nico
nico

Reputation: 51640

Should I duplicate data in my DB?

I have a MySQL DB containing entry for pages of a website. Let's say it has fields like:

Table pages:

id  |  title  | content | date | author

Each of the pages can be voted by users, so I have two other tables

Table users:
id  |  name  | etc etc etc

Table votes:
id  |  id_user | id_page | vote

Now, I have a page where I show a list of the pages (10-50 at a time) with various information along with the average vote of the page.

So, I was wondering if it were better to:

a) Run the query to display the pages (note that this is already fairly heavy as it queries three tables) and then for each entry do another query to calculate the mean vote (or add a 4th join to the main query?).

or

b) Add an "average vote" column to the pages table, which I will update (along with the vote table) when an user votes the page.

nico

Upvotes: 2

Views: 84

Answers (3)

Alexis Dufrenoy
Alexis Dufrenoy

Reputation: 11946

Honestly, for this issue, I would recommend redundent information. Multiple votes for multiple pages can really create a heavy load for a server, in my opinion. If you forsee to have real traffic on your website, of course... :-)

Upvotes: 1

Paul Sonier
Paul Sonier

Reputation: 39480

Use the database for what it's meant for; option a is by far your best bet. It's worth noting that your query isn't actually particularly heavy, joining three tables; SQL really excels at this sort of thing.

Be cautious of this sort of attempt at premature optimization of SQL; SQL is far more efficient at what it does than most people think it is.

Note that another benefit from using your option a is that there's less code to maintain, and less chance of data diverging as code gets updated; it's a lifecycle benefit, and they're too often ignored for miniscule optimization benefits.

Upvotes: 6

John
John

Reputation: 16007

You might "repeat yourself" (violate DRY) for the sake of performance. The trade-offs are (a) extra storage, and (b) extra work in keeping everything self-consistent within your DB.

There are advantages/disadvantages both ways. Optimizing too early has its own set of pitfalls, though.

Upvotes: 1

Related Questions