Reputation: 31
I'm still in the process of learning SQL. I've done similar queries in two different ways, and was wondering which is better and why.
UPDATE R
SET R.something = 1
FROM Table1 R
JOIN Table2 U
ON R.value1 = U.value2
WHERE
U.value3 BETWEEN 1 AND 5
or
UPDATE R
SET R.something = 1
WHERE R.value1 IN
(SELECT U.value2
FROM U
WHERE
U.value3 BETWEEN 1 AND 5
)
Upvotes: 3
Views: 125
Reputation: 647
First query is better.
Relational databases, regardless of the actual DBMS you're using, are built exactly to join data in that manner and filter it with a where clause. It's their bread and butter. In the second query, you are using a subquery to gather additional data. That's totally cool, and relational databases will churn through that just fine too. But, with the subquery, and in this specific case, you'll just end up with two queries, one to get the U data, and then the outer query will occur, using the data from your subquery to set the R data.
Here's the tricky bit though. In your query, your subquery completely references a separate table. So it'll still be fast. That subquery is contained to just U data. You'll get 2 queries - get the U data, then update R data using the U data. But if you wrote a similar query where the subquery referenced data from R, then you wouldn't get two separate queries. You'd end up doing a full table scan of all the data in R, which would be considerably slower.
Editing for more completeness: as others have said, a lot of it comes down to what DBMS you're using and what it's best at. And when first learning SQL (I'm by no means an expert) one of the hurdles is realizing that there are SO many ways to do the same thing, to get the same results, and then often end up getting optimized to the same thing. So finding the "right" way is often futile, as there isn't a distinct "right" way. I try and write no only for correctness and speed, but also for maintainability - and I find that subqueries can be harder on the brain than necessary. I try to do without them if I can avoid them (so long as the alternative isn't a cursor or something :-D).
Upvotes: 1
Reputation: 4148
It depends on the database and the database version Searching on whether IN or JOIN is better (faster) returns different results (IN is faster or JOIN is faster) depending on the database and even on the SQL statement - it can even change depending on the database version. It is always a good idea to test with multiple data sizes and different platforms!
(Query #2 is the easiest to read but may or may not be slower)
Upvotes: 0
Reputation: 35706
It may depend on the engine you are using but I believe MS SQL Server will optimize both to the same query plan.
If an engine does not I would suggest it is inferior in this instance.
Upvotes: 0
Reputation: 10118
It all depends on the database that you plan to use (Oracle, SQL Server and etc), its version and sometimes on the amount of data in your tables. But generally you should prefer JOINs, as they are easier for most optimizers and has less gotchas with nulls.
Upvotes: 1
Reputation: 1269603
Your question does not have a single answer. SQL is a descriptive language, not a procedural language. It depends on the database engine which is going to be more efficient. In addition, indexes can have a big effect on performance.
Your two queries, by the way, are not equivalent. The first can return multiple rows, when there are multiple values in "U". The version with the "IN" does an implicit "DISTINCT". To fix this you would need to add in a specific "DISTINCT".
UPDATE R
SET R.something = 1
FROM Table1 R
JOIN (select distinct value2
from Table2 U
WHERE U.value3 BETWEEN 1 AND 5
) u
ON R.value1 = U.value2
Also, although I personally like the "FROM" statement in the update, not all databases support it. The version with the "IN" is compatible across a wider range of database engines.
Upvotes: 2
Reputation: 763
Better will be query with JOIN, because it will be faster than sub-select.
Upvotes: 0