Reputation: 9487
I've been reading a lot about Relational Databases using many JOIN statements on every SELECT. However, I've been wondering if there's any performance problem on the long run when abusing this method.
For example, lets say we have a users
table. I would usually add the "most used" data, instead of doing any extra JOINs. When I say the "most used" data, for instance, would be the username, display picture and location.
This data would always be needed when displaying any user interaction on the website, example: on every comments
table JOIN for articles
. Instead of doing a JOIN on the users
& users_profiles
tables to get the 'location' and 'display', just use the information on users
table.
That's my approach, however I do know that there are a lot of excellent and experienced programmers that can give me a word of advice about this matter.
My questions are:
Should I try to be conservative with the JOINs? or should I use them more? Why?
Are there any performance problems on the long run when using JOIN a lot?
Note: I must clarify, that I'm not trying to avoid JOINS at all. I use them only when needed. On this example would be comment/article authors, extra profile information that only displays on user profiles pages... etc.
Upvotes: 5
Views: 2160
Reputation: 1620
As others have said - joins aren't a thing to avoid at all. In fact, in most models it is rare not to have a few joins in every single query that the application runs.
Even in the biggest queries they aren't usually a performance problems - and often fix performance problems that would occur if you have redundant and repeating data all over the place.
However, be aware that under the cover the database just joins two tables at a time. So, joins necessitate multiple steps by the database that are invisible to the developer. When it does these joins it has to make a few decisions about how to go about it:
So, if your joins are complex ultimately the efficiency will be driven by the sophistication of your optimizer/planner and the currency and detail of your statistics. MySQL isn't a strong contender here - so I'd generally keep my model and sql a little simpler than if I was using something else. But a few joins per query should almost always be fine.
Upvotes: 0
Reputation: 10637
Fear not joining. The relational model is strong and you should employ it. Someone always discussed N+1, but also consider--in your context--joining against users often for security purposes too as the query can additionally mandate user existence, status, session correctness, and field expectation.
Many large sites go so far as to have session table and http request table for every request, always joined against each other for the page queries. Benefit is that parameters are always matched to sessions, sessions to proper users, user status always checked, &c &c but moreso that it allows for some interesting scale-out benefits.
Long story, do it wisely, but don't skimp on joining.
Upvotes: 0
Reputation: 625097
My advice on data modeling is:
More in Database Development Mistakes Made by AppDevelopers.
Now as for directness of a model, let me give you an example. Let's say you're designing a system for authentication and authorization of users. An overengineered solution might look something like this:
So you need 6 joins to get from the username entered to the actual privileges. Sure there might be an actual requirement for this but more often than not this kind of system is put in because of the hand-wringing by some developer thinking they might someday need it even though every user only has one alias, user to login is 1:1 and so on. A simpler solution is:
and, well, that's it. Perhaps if you need a complex role system but it's also quite possible that you don't and if you do it's reasonably easy to slot in (user type becomes a foreign key into a user types or roles table) or it's generally straightforward to map the old to the new.
This is thing about complexity: it's easy to add and hard to remove. Usually it's a constant vigil against unintended complexity, which is bad enough without going and making it worse by adding unnecessary complexity.
Upvotes: 8
Reputation: 28844
Some bright person once said:
Normalize until it hurts, denormalize until it works!
It all depends on the type of joins, and the join conditions, but there are nothing wrong with them. Joins ON table1.PK = table2.FK are very efficient.
Upvotes: 6
Reputation: 166406
If the data is 1 <-> 1, and you will not have many null fields, dont over normalize. You can still specify the fields required ("most used data") in the select statements.
Upvotes: 1