zetetic
zetetic

Reputation: 47548

LEFT JOIN vs. multiple SELECT statements

I am working on someone else's PHP code and seeing this pattern over and over:

(pseudocode)

result = SELECT blah1, blah2, foreign_key FROM foo WHERE key=bar

if foreign_key > 0  
  other_result = SELECT something FROM foo2 WHERE key=foreign_key  
end

The code needs to branch if there is no related row in the other table, but couldn't this be done better by doing a LEFT JOIN in a single SELECT statement? Am I missing some performance benefit? Portability issue? Or am I just nitpicking?

Upvotes: 7

Views: 13183

Answers (13)

Sanjeev Singh
Sanjeev Singh

Reputation: 4066

There are many cases that will require different solutions and it isn't possible to explain all together.

Join scans both the tables and loops to match the first table record in second table. Simple select query will work faster in many cases as It only take cares for the primary/unique key(if exists) to search the data internally.

Upvotes: 1

Patrick Desjardins
Patrick Desjardins

Reputation: 140743

You should always try to minimize the number of query to the database when you can. Your example is perfect for only 1 query. This way you will be able later to cache more easily or to handle more request in same time because instead of always using 2-3 query that require a connexion, you will have only 1 each time.

Upvotes: 1

JohnFx
JohnFx

Reputation: 34909

You are completely correct that the single query is the way to go. To add some value to the other answers offered let me add this axiom: "Use the right tool for the job, the Database server should handle the querying work, the code should handle the procedural work."

The key idea behind this concept is that the compiler/query optimizers can do a better job if they know the entire problem domain instead of half of it.

Upvotes: 2

Astra
Astra

Reputation: 11211

Doing a simple two table join is usually the best way to go after this problem domain, however depending on the state of the tables and indexing, there are certain cases where it may be better to do the two select statements, but typically I haven't run into this problem until I started approaching 3-5 joined tables, not just 2.

Just make sure you have covering indexes on both tables to ensure you aren't scanning the disk for all records, that is the biggest performance hit a database gets (in my limited experience)

Upvotes: 1

staticsan
staticsan

Reputation: 30555

There is not enough information to really answer the question. I've worked on applications where decreasing the query count for one reason and increasing the query count for another reason both gave performance improvements. In the same application!

For certain combinations of table size, database configuration and how often the foreign table would be queried, doing the two queries can be much faster than a LEFT JOIN. But experience and testing is the only thing that will tell you that. MySQL with moderately large tables seems to be susceptable to this, IME. Performing three queries on one table can often be much faster than one query JOINing the three. I've seen speedups of an order of magnitude.

Upvotes: 5

Tamara Wijsman
Tamara Wijsman

Reputation: 12348

A single SQL query would lead in more performance as the SQL server (Which sometimes doesn't share the same location) just needs to handle one request, if you would use multiple SQL queries then you introduce a lot of overhead:

Executing more CPU instructions, sending a second query to the server, create a second thread on the server, execute possible more CPU instructions on the sever, destroy a second thread on the server, send the second results back.

There might be exceptional cases where the performance could be better, but for simple things you can't reach better performance by doing a bit more work.

Upvotes: 1

Bill Karwin
Bill Karwin

Reputation: 562240

The most likely explanation is that the developer simply doesn't know how outer joins work. This is very common, even among developers who are quite experienced in their own specialty.

There's also a widespread myth that "queries with joins are slow." So many developers blindly avoid joins at all costs, even to the extreme of running multiple queries where one would be better.

The myth of avoiding joins is like saying we should avoid writing loops in our application code, because running a line of code multiple times is obviously slower than running it once. To say nothing of the "overhead" of ++i and testing i<20 during every iteration!

Upvotes: 2

Dillie-O
Dillie-O

Reputation: 29725

The only "gotcha" to all of this is if the result set to work with contains a lot of joins, or even nested joins.

I've had two or three instances now where the original query I was inheriting consisted of a single query that had so a lot of joins in it and it would take the SQL a good minute to prepare the statement.

I went back into the procedure, leveraged some table variables (or temporary tables) and broke the query down into a lot of the smaller single select type statements and constructed the final result set in this manner.

This update dramatically fixed the response time, down to a few seconds, because it was easier to do a lot of simple "one shots" to retrieve the necessary data.

I'm not trying to object for objections sake here, but just to point out that the code may have been broken down to such a granular level to address a similar issue.

Upvotes: 1

KMessenger
KMessenger

Reputation: 81

Considering that in one database hit you have all the data you need having one single SQL statement would be better performance 99% of the time. Not sure if the connections is being creating dynamically in this case or not but if so doing so is expensive. Even if the process if reusing existing connections the DBMS is not getting optimize the queries be best way and not really making use of the relationships.

The only way I could ever see doing the calls like this for performance reasons is if the data being retrieved by the foreign key is a large amount and it is only needed in some cases. But in the sample you describe it just grabs it if it exists so this is not the case and therefore not gaining any performance.

Upvotes: 1

Mike
Mike

Reputation: 8963

This is definitely wrong. You are going over the wire a second time for no reason. DBs are very fast at their problem space. Joining tables is one of those and you'll see more of a performance degradation from the second query then the join. Unless your tablespace is hundreds of millions of records, this is not a good idea.

Upvotes: 6

BenAlabaster
BenAlabaster

Reputation: 39806

It seems to me that what you're saying is fairly valid - why fire off two calls to the database when one will do - unless both records are needed independently as objects(?)

Of course while it might not be as simple code wise to pull it all back in one call from the database and separate out the fields into the two separate objects, it does mean that you're only dependent on the database for one call rather than two...

This would be nicer to read as a query:

Select a.blah1, a.blah2, b.something From foo a Left Join foo2 b On a.foreign_key = b.key Where a.Key = bar;

And this way you can check you got a result in one go and have the database do all the heavy lifting in one query rather than two...

Yeah, I think it seems like what you're saying is correct.

Upvotes: 2

Jonathan Leffler
Jonathan Leffler

Reputation: 753495

There's a danger of treating your SQL DBMS as if it was a ISAM file system, selecting from a single table at a time. It might be cleaner to use a single SELECT with the outer join. On the other hand, detecting null in the application code and deciding what to do based on null vs non-null is also not completely clean.

One advantage of a single statement - you have fewer round trips to the server - especially if the SQL is prepared dynamically each time the other result is needed.

On average, then, a single SELECT statement is better. It gives the optimizer something to do and saves it getting too bored as well.

Upvotes: 2

hamishmcn
hamishmcn

Reputation: 7981

I'm with you - a single SQL would be better

Upvotes: 3

Related Questions