Reputation: 327
This is a snippet from my stored Proc
SELECT NULL AS StoryID
, AlbumID
, CAST(NULL as varchar) AS StoryTitle
, AlbumName
, (SELECT URL FROM AlbumPictures AS AlbumPictures_3 WHERE (AlbumID = Albums.AlbumID) AND (AlbumCover = 'True')) AS AlbumCover
, Votes
, CAST(NULL as Int) AS PictureId
, 'albums' AS tableName
, (SELECT NestedAlbums.AlbumID FROM NestedAlbums WHERE (AlbumID = Albums.AlbumID)) AS Flag
INTO #Results2
FROM Albums WHERE AlbumID IN (SELECT StringVal FROM funcListToTableInt(@whereAlbumID))
I have used nested selects in my Query above. I am curious to know whether Nested Selects
are better than LEFT/Right JOINS
OR should i use JOINS
?
Table Albums:
Table NestedAlbums:
Upvotes: 6
Views: 9850
Reputation: 11007
The first difference is, the FROM
clause governs initial cardinality of the results.
In your case, the results will have one row per row in Album. Scalar subqueries in the SELECT
clause cannot change this. If a subquery happens to return multiple rows, SQL Server will throw an exception. It will never add to the results.
When you move this logical join to the FROM
clause, you redefine your initial cardinality. It is no longer one row per row in Album, but one row per row in Album LEFT OUTER JOIN AblumPictures ON...
etc. If that yields multiple rows per row in Album, SQL Server will not throw an exception as it did for the subselect. Rather, it will add rows to the results.
So in this regard, the subquery does a better job expressing intent, and a better job proactively protecting you from data that violates that intent: "give me one row per Album, and for each album, include the URL for here, the nested ID from there" etc.
HOWEVER, functionally speaking, there is a huge drawback: scalar subqueries cannot return whole tuples. You've gone through all this work to write the subquery, SQL Server has gone through all this work to execute it, and now you are limited to this one single scalar return value! Sometimes that's fine, but sometimes you need more. When you need more, you need the FROM
clause.
The closest FROM
-clause equivalent to a scalar subquery is not OUTER JOIN
, but rather the wondrous OUTER APPLY
. OUTER APPLY
is not a scalar expression: it return whole tuples, and any number of rows.
First approximation:
SELECT Albums.*, AlbumPictures.URL, NestedAlbums.AlbumID
FROM Albums
OUTER APPLY (
SELECT TOP (1) * FROM AlbumPictures
WHERE (AlbumID = Albums.AlbumID) AND (AlbumCover = 'True')
) AlbumPictures
OUTER APPLY (
SELECT TOP (1) * FROM NestedAlbums
WHERE (AlbumID = Albums.AlbumID)
) NestedAlbums
WHERE Albums.AlbumID IN (SELECT StringVal FROM funcListToTableInt(@whereAlbumID))
So, by virtue of TOP (1)
, Albums still governs the initial cardinality of the results. However, we now have access to all the columns from the related tables, and this is awesome.
Then, if we are confident that TOP (1)
is not necessary--that by virtue of keys and indexes the subquery can only ever return one row--then we can rewrite using a simpler form:
SELECT Albums.*, AlbumPictures.URL, NestedAlbums.AlbumID
FROM Albums
OUTER APPLY (
SELECT * FROM AlbumPictures
WHERE (AlbumID = Albums.AlbumID) AND (AlbumCover = 'True')
) AlbumPictures
OUTER APPLY (
SELECT * FROM NestedAlbums
WHERE (AlbumID = Albums.AlbumID)
) NestedAlbums
WHERE Albums.AlbumID IN (SELECT StringVal FROM funcListToTableInt(@whereAlbumID))
Which is now logically equivalent to OUTER JOIN
:
SELECT Albums.*, AlbumPictures.URL, NestedAlbums.AlbumID
FROM Albums
LEFT OUTER JOIN AlbumPictures
ON AlbumPictures.AlbumID = Albums.AlbumID
AND AlbumPictures.AlbumCover = 'True'
LEFT OUTER JOIN NestedAlbums
ON NestedAlbums.AlbumID = Albums.AlbumID
WHERE Albums.AlbumID IN (SELECT StringVal FROM funcListToTableInt(@whereAlbumID))
And there you have it. Which is better? Well, whatever you do, please keep it simple.
PERFORMANCE WISE, generally, there is not a whole heck of a lot of difference between the forms. You can compare execution plans side-by-side for your particular tables and indexes. It's a good learning experience to see how SQL Server rewrites logically equivalent queries. I'd expect to see identical plans for OUTER APPLY
(w/o TOP (1)
) and LEFT OUTER JOIN
.
Upvotes: 4
Reputation: 452977
In general writing this an explicit OUTER JOIN
would be better.
SQL Server will probably need to add an Assert to the plan with the sub query version that verifies the sub query only returns at most one row (unless this is guaranteed by a unique index). This can limit the possible transformations available. See Scalar Subqueries for more about this.
Also (though not relevant to the example in your question as both sub queries are different) writing as an explicit JOIN
allows you to use multiple columns from the joined table with one lookup whereas using separate similar subqueries would not (SQL Server has no logic to detect the common sub expressions).
Edit:
Following discussion in comments something like
SELECT NULL AS StoryID,
A.AlbumID,
CAST(NULL AS VARCHAR(30)) AS StoryTitle,
A.AlbumName,
AP.URL AS AlbumCover,
A.Votes,
CAST(NULL AS INT) AS PictureId,
'albums' AS tableName,
CASE
WHEN EXISTS (SELECT *
FROM NestedAlbums NA
WHERE NA.AlbumID = A.AlbumID
AND ( AccountId = @AccountId )) THEN 1
ELSE 0
END AS Flag
INTO #Results2
FROM Albums A
LEFT OUTER JOIN AlbumPictures AP
ON ( AP.AlbumID = A.AlbumID )
AND ( AP.AlbumCover = 'True' )
WHERE A.AlbumID IN (SELECT StringVal
FROM funcListToTableInt(@whereAlbumID))
You may notice this still has a sub query in the SELECT
list but CASE ... EXISTS
will be implemented efficiently as a semi join.
At the moment your query assumes that at most one matching row per album will be returned from AlbumPictures
and would error out if this assumption is not true. This changes the semantics in that no error will be returned and you would get multiple rows with the various URL
s. If you didn't want that to happen you would need to define which URL
to use and add a GROUP BY
Upvotes: 8
Reputation: 4637
This is a guess query based on Martin's suggestion of using an OUTER JOIN. I believe when he said an "explicit" join, he was referring to just choosing that technique and not a technical type of join. The query below assumes that you will only ever get back one row From the NestedAlbums and AlbumCover tables when searching for a single Album ID - if this is not true, you will get "duplicate rows" and will have to add some more conditions to one of the join clauses to remove them. This query also assumes that AlbumCover is in the Albums table. If not, you would have to change the text a.AlbumCover
to ac.AlbumCover
SELECT NULL AS StoryID
, a.AlbumID
, CAST(NULL as varchar) AS StoryTitle
, a.AlbumName
, ac.URL AS AlbumCover
, a.Votes
, CAST(NULL as Int) AS PictureId
, 'albums' AS tableName
, na.AlbumID AS Flag
INTO #Results2
FROM Albums a
LEFT OUTER JOIN NestedAlbums na ON a.AlbumID = na.AlbumID
LEFT OUTER JOIN AlbumCover ac ON a.AlbumID = ac.AlbumID AND a.AlbumCover = 'True'
WHERE a.AlbumID IN (SELECT StringVal FROM funcListToTableInt(@whereAlbumID))
This query takes the results from Albums, and includes any rows in NestedAlbums that have a matching AlbumID, and also any rows that have a matching AlbumID in the AlbumCover table (but only if the AlbumCover field in Albums is "True" for that row). Because we've chosen LEFT OUTER JOIN as our operator, if either NestedAlbums or AlbumCover does not have matching rows, SQL Server will return NULL for those fields, but the rows will be returned by the query anyway. If the joins had been INNER JOIN, then if there were not matching rows in the joined tables, the row from the main table would be filtered out also.
Upvotes: 2
Reputation: 40289
For my money, queries written using joins instead of nested selects are "better" because they are easier for people to read and understand (think maintenance, support, future modification, etc.) All the table joining logic is in one place (the from
clause), and all the columns being returned are in one other place (select
clause), it just makes it easier to figure out what's going on.
Others have pointed out that there may or may not be perfomance hits executing the query (+1 to @Martin Smith, as usual, for pointing out obscure-but-very-relevant detail), and those answers are what you're looking for here... but take a minute or two to consider the poor developer in the future who has to figure out what the code that's been dumped on them is actually trying to do. After all, it might be you...
Upvotes: 1
Reputation: 2150
In your case both approaches will be almost the same, if you consider performance. ( Considering you have well indexed tables in your data base)
JOIN is easier to use and efficient if too many JOIN are required. JOIN may create duplicate data while joining, but Nested queries will not.
But finally the performance will be based on the amount and the organization of the data you have. You can use SQL performance monitor tools for validating based on your data.
Upvotes: 0