user1593175
user1593175

Reputation: 327

Nested Selects Or JOINS?

This is a snippet from my stored Proc

SELECT  NULL AS StoryID
      , AlbumID
      , CAST(NULL as varchar) AS StoryTitle
      , AlbumName
      , (SELECT URL FROM AlbumPictures AS AlbumPictures_3 WHERE (AlbumID = Albums.AlbumID) AND (AlbumCover = 'True')) AS AlbumCover
      , Votes
      , CAST(NULL as Int) AS PictureId
      , 'albums' AS tableName
      , (SELECT NestedAlbums.AlbumID FROM NestedAlbums WHERE (AlbumID = Albums.AlbumID)) AS Flag
INTO #Results2
FROM Albums WHERE AlbumID IN (SELECT StringVal FROM funcListToTableInt(@whereAlbumID))

I have used nested selects in my Query above. I am curious to know whether Nested Selects are better than LEFT/Right JOINS OR should i use JOINS?

Table Albums:

enter image description here

Table NestedAlbums:

enter image description here

Upvotes: 6

Views: 9850

Answers (5)

Peter Radocchia
Peter Radocchia

Reputation: 11007

The first difference is, the FROM clause governs initial cardinality of the results.

In your case, the results will have one row per row in Album. Scalar subqueries in the SELECT clause cannot change this. If a subquery happens to return multiple rows, SQL Server will throw an exception. It will never add to the results.

When you move this logical join to the FROM clause, you redefine your initial cardinality. It is no longer one row per row in Album, but one row per row in Album LEFT OUTER JOIN AblumPictures ON... etc. If that yields multiple rows per row in Album, SQL Server will not throw an exception as it did for the subselect. Rather, it will add rows to the results.

So in this regard, the subquery does a better job expressing intent, and a better job proactively protecting you from data that violates that intent: "give me one row per Album, and for each album, include the URL for here, the nested ID from there" etc.

HOWEVER, functionally speaking, there is a huge drawback: scalar subqueries cannot return whole tuples. You've gone through all this work to write the subquery, SQL Server has gone through all this work to execute it, and now you are limited to this one single scalar return value! Sometimes that's fine, but sometimes you need more. When you need more, you need the FROM clause.

The closest FROM-clause equivalent to a scalar subquery is not OUTER JOIN, but rather the wondrous OUTER APPLY. OUTER APPLY is not a scalar expression: it return whole tuples, and any number of rows.

First approximation:

SELECT Albums.*, AlbumPictures.URL, NestedAlbums.AlbumID
FROM Albums
OUTER APPLY (
  SELECT TOP (1) * FROM AlbumPictures
  WHERE (AlbumID = Albums.AlbumID) AND (AlbumCover = 'True')
  ) AlbumPictures
OUTER APPLY (
  SELECT TOP (1) * FROM NestedAlbums 
  WHERE (AlbumID = Albums.AlbumID)
  ) NestedAlbums 
WHERE Albums.AlbumID IN (SELECT StringVal FROM funcListToTableInt(@whereAlbumID))

So, by virtue of TOP (1), Albums still governs the initial cardinality of the results. However, we now have access to all the columns from the related tables, and this is awesome.

Then, if we are confident that TOP (1) is not necessary--that by virtue of keys and indexes the subquery can only ever return one row--then we can rewrite using a simpler form:

SELECT Albums.*, AlbumPictures.URL, NestedAlbums.AlbumID
FROM Albums
OUTER APPLY (
  SELECT * FROM AlbumPictures
  WHERE (AlbumID = Albums.AlbumID) AND (AlbumCover = 'True')
  ) AlbumPictures
OUTER APPLY (
  SELECT * FROM NestedAlbums 
  WHERE (AlbumID = Albums.AlbumID)
  ) NestedAlbums 
WHERE Albums.AlbumID IN (SELECT StringVal FROM funcListToTableInt(@whereAlbumID))

Which is now logically equivalent to OUTER JOIN:

SELECT Albums.*, AlbumPictures.URL, NestedAlbums.AlbumID
FROM Albums
LEFT OUTER JOIN AlbumPictures 
  ON AlbumPictures.AlbumID = Albums.AlbumID 
 AND AlbumPictures.AlbumCover = 'True'
LEFT OUTER JOIN NestedAlbums 
  ON NestedAlbums.AlbumID = Albums.AlbumID
WHERE Albums.AlbumID IN (SELECT StringVal FROM funcListToTableInt(@whereAlbumID))

And there you have it. Which is better? Well, whatever you do, please keep it simple.

PERFORMANCE WISE, generally, there is not a whole heck of a lot of difference between the forms. You can compare execution plans side-by-side for your particular tables and indexes. It's a good learning experience to see how SQL Server rewrites logically equivalent queries. I'd expect to see identical plans for OUTER APPLY (w/o TOP (1)) and LEFT OUTER JOIN.

Upvotes: 4

Martin Smith
Martin Smith

Reputation: 452977

In general writing this an explicit OUTER JOIN would be better.

SQL Server will probably need to add an Assert to the plan with the sub query version that verifies the sub query only returns at most one row (unless this is guaranteed by a unique index). This can limit the possible transformations available. See Scalar Subqueries for more about this.

Also (though not relevant to the example in your question as both sub queries are different) writing as an explicit JOIN allows you to use multiple columns from the joined table with one lookup whereas using separate similar subqueries would not (SQL Server has no logic to detect the common sub expressions).

Edit:

Following discussion in comments something like

SELECT NULL                      AS StoryID,
       A.AlbumID,
       CAST(NULL AS VARCHAR(30)) AS StoryTitle,
       A.AlbumName,
       AP.URL                    AS AlbumCover,
       A.Votes,
       CAST(NULL AS INT)         AS PictureId,
       'albums'                  AS tableName,
       CASE
         WHEN EXISTS (SELECT *
                      FROM   NestedAlbums NA
                      WHERE  NA.AlbumID = A.AlbumID
                             AND ( AccountId = @AccountId )) THEN 1
         ELSE 0
       END                       AS Flag
INTO   #Results2
FROM   Albums A
       LEFT OUTER JOIN AlbumPictures AP
         ON ( AP.AlbumID = A.AlbumID )
            AND ( AP.AlbumCover = 'True' )
WHERE  A.AlbumID IN (SELECT StringVal
                     FROM   funcListToTableInt(@whereAlbumID)) 

You may notice this still has a sub query in the SELECT list but CASE ... EXISTS will be implemented efficiently as a semi join.

At the moment your query assumes that at most one matching row per album will be returned from AlbumPictures and would error out if this assumption is not true. This changes the semantics in that no error will be returned and you would get multiple rows with the various URLs. If you didn't want that to happen you would need to define which URL to use and add a GROUP BY

Upvotes: 8

NYCdotNet
NYCdotNet

Reputation: 4637

This is a guess query based on Martin's suggestion of using an OUTER JOIN. I believe when he said an "explicit" join, he was referring to just choosing that technique and not a technical type of join. The query below assumes that you will only ever get back one row From the NestedAlbums and AlbumCover tables when searching for a single Album ID - if this is not true, you will get "duplicate rows" and will have to add some more conditions to one of the join clauses to remove them. This query also assumes that AlbumCover is in the Albums table. If not, you would have to change the text a.AlbumCover to ac.AlbumCover

SELECT  NULL AS StoryID
      , a.AlbumID
      , CAST(NULL as varchar) AS StoryTitle
      , a.AlbumName
      , ac.URL AS AlbumCover
      , a.Votes
      , CAST(NULL as Int) AS PictureId
      , 'albums' AS tableName
      , na.AlbumID AS Flag
INTO #Results2
FROM Albums a
LEFT OUTER JOIN NestedAlbums na ON a.AlbumID = na.AlbumID
LEFT OUTER JOIN AlbumCover ac ON a.AlbumID = ac.AlbumID AND a.AlbumCover = 'True'
WHERE a.AlbumID IN (SELECT StringVal FROM funcListToTableInt(@whereAlbumID))

This query takes the results from Albums, and includes any rows in NestedAlbums that have a matching AlbumID, and also any rows that have a matching AlbumID in the AlbumCover table (but only if the AlbumCover field in Albums is "True" for that row). Because we've chosen LEFT OUTER JOIN as our operator, if either NestedAlbums or AlbumCover does not have matching rows, SQL Server will return NULL for those fields, but the rows will be returned by the query anyway. If the joins had been INNER JOIN, then if there were not matching rows in the joined tables, the row from the main table would be filtered out also.

Upvotes: 2

Philip Kelley
Philip Kelley

Reputation: 40289

For my money, queries written using joins instead of nested selects are "better" because they are easier for people to read and understand (think maintenance, support, future modification, etc.) All the table joining logic is in one place (the from clause), and all the columns being returned are in one other place (select clause), it just makes it easier to figure out what's going on.

Others have pointed out that there may or may not be perfomance hits executing the query (+1 to @Martin Smith, as usual, for pointing out obscure-but-very-relevant detail), and those answers are what you're looking for here... but take a minute or two to consider the poor developer in the future who has to figure out what the code that's been dumped on them is actually trying to do. After all, it might be you...

Upvotes: 1

Wolf
Wolf

Reputation: 2150

In your case both approaches will be almost the same, if you consider performance. ( Considering you have well indexed tables in your data base)

JOIN is easier to use and efficient if too many JOIN are required. JOIN may create duplicate data while joining, but Nested queries will not.

But finally the performance will be based on the amount and the organization of the data you have. You can use SQL performance monitor tools for validating based on your data.

Upvotes: 0

Related Questions