Reputation: 494
This seems to be a reoccurring issue in the queries that I have been writing recently, and I would like to discover which query style is:
More Information:
The queries I am writing tend to always pull from one base table and will also join to several other tables; however, the joined tables often have a vertical orientation where the foreign key is referenced several times with a unique "descriptor" and "response." (See: Table #MovieDescriptions for an example.)
Please use the following SQL queries as a test scenario:
-- Drop temp tables if exist
IF OBJECT_ID('TempDB..#Movies','U') IS NOT NULL
DROP TABLE #Movies
IF OBJECT_ID('TempDB..#MovieDescriptions','U') IS NOT NULL
DROP TABLE #MovieDescriptions
-- Creating temp tables
CREATE TABLE #Movies
(
MovieID int IDENTITY(1,1),
MovieName varchar (100),
ReleaseYear datetime,
Director varchar (100)
)
CREATE TABLE #MovieDescriptions
(
MovieDescID int IDENTITY(1,1),
FK_MovieID varchar(100),
DescriptionType varchar(100),
DescriptionResponse varchar(100)
)
-- Inserting test data
INSERT INTO #Movies (MovieName, ReleaseYear, Director) VALUES ('Gone With the Wind', CONVERT(datetime,'12/15/1939'), 'Victor Fleming')
INSERT INTO #Movies (MovieName, ReleaseYear, Director) VALUES ('2001: A Space Odyssey', CONVERT(datetime,'01/01/1968'), 'Stanley Kubrick')
INSERT INTO #MovieDescriptions (FK_MovieID, DescriptionType, DescriptionResponse) VALUES ('1', 'Written By', 'Sideny Howard')
INSERT INTO #MovieDescriptions (FK_MovieID, DescriptionType, DescriptionResponse) VALUES ('1', 'Genre', 'Drama')
INSERT INTO #MovieDescriptions (FK_MovieID, DescriptionType, DescriptionResponse) VALUES ('1', 'Rating', 'G')
INSERT INTO #MovieDescriptions (FK_MovieID, DescriptionType, DescriptionResponse) VALUES ('2', 'Written By', 'Standley Kubrick')
INSERT INTO #MovieDescriptions (FK_MovieID, DescriptionType, DescriptionResponse) VALUES ('2', 'Genre', 'Sci-Fi')
INSERT INTO #MovieDescriptions (FK_MovieID, DescriptionType, DescriptionResponse) VALUES ('2', 'Rating', 'G')
-- Using subqueries
SELECT
MovieName,
ReleaseYear,
(SELECT DescriptionResponse
FROM #MovieDescriptions
WHERE FK_MovieID = #Movies.MovieID AND DescriptionType = 'Genre'
) AS Genre,
(SELECT DescriptionResponse
FROM #MovieDescriptions
WHERE FK_MovieID = #Movies.MovieID AND DescriptionType = 'Rating'
) AS Rating
FROM #Movies
-- Using aggregate functions
SELECT
MovieName,
ReleaseYear,
MAX(CASE WHEN md.DescriptionType = 'Genre' THEN DescriptionResponse END) AS Genre,
MAX(CASE WHEN md.DescriptionType = 'Rating' THEN DescriptionResponse END) AS Rating
FROM #Movies m
INNER JOIN #MovieDescriptions md
ON m.MovieID = md.FK_MovieID
GROUP BY MovieName, ReleaseYear
In addition, if there is a better way of selecting this data, that would be helpful as well.
Upvotes: 4
Views: 1985
Reputation: 28920
I like this version of your query better..
SELECT
MovieName,
ReleaseYear,
MAX(CASE WHEN md.DescriptionType = 'Genre' THEN DescriptionResponse END) AS Genre,
MAX(CASE WHEN md.DescriptionType = 'Rating' THEN DescriptionResponse END) AS Rating
FROM #Movies m
INNER JOIN #MovieDescriptions md
ON m.MovieID = md.FK_MovieID
GROUP BY MovieName, ReleaseYear
But This involves a sort on movies tables and this sort cost is 63% of total cost and you are restricted in getting all columns..
I rewrote this version using Apply which doesn't have sort and you are not restricted in getting all columns..
select
* from
#movies m
cross apply
(
select
max(case when descriptiontype='genre' then descriptionresponse end) as genre,
max(case when descriptiontype='rating' then descriptionresponse end) as rating
from
#MovieDescriptions md
where md.fk_movieid=m.movieid)b
execution plan of above version..
Now Both queries doesn't have indexes..
What are the indexes that are needed for First query to work and scale..
An additional index on MovieName, ReleaseYear to avoid sort and to tell SQLServer these are unique..since you already had (assuming) primary key on Movieid ,having one more index to avoid sort is ,some thing which i don't prefer and if you need more columns from first query,you will have to add it again to index..
For The second query to scale better,
you will need an index on FK_movieid which i assume you have it already..
PS:
I may have missed bunch of things,But i would go with my query ..
Upvotes: 1
Reputation: 36513
Assuming a more normal setup where your tables are properly indexed, and where the foreign key relationship columns have matching data types (hint hint: they don't currently match, int
vs. varchar
), then you should always find that the your second query (joins + aggregation) outperforms the first (subqueries in your select
clause). It may not be noticeable with small amounts of data. But the more your base table has data (#Movies
), the more the difference will be obvious.
The reason is simple. In your first query:
SELECT
MovieName,
ReleaseYear,
(SELECT DescriptionResponse
FROM #MovieDescriptions
WHERE FK_MovieID = #Movies.MovieID AND DescriptionType = 'Genre'
) AS Genre,
(SELECT DescriptionResponse
FROM #MovieDescriptions
WHERE FK_MovieID = #Movies.MovieID AND DescriptionType = 'Rating'
) AS Rating
FROM #Movies
If #Movies
contains 1000 rows, then SQL Server really has no choice but to perform one full table scan on #Movies
, and, for each of the 1000 rows, it needs to perform 2 additional queries on #MovieDescriptions
. In reality, you are executing a total of 2001 queries. Because your subqueries are in the SELECT
clause, SQL Server has no choice but to execute the query in this way.
On the other hand, your 2nd query:
SELECT
MovieName,
ReleaseYear,
MAX(CASE WHEN md.DescriptionType = 'Genre' THEN DescriptionResponse END) AS Genre,
MAX(CASE WHEN md.DescriptionType = 'Rating' THEN DescriptionResponse END) AS Rating
FROM #Movies m
INNER JOIN #MovieDescriptions md
ON m.MovieID = md.FK_MovieID
GROUP BY MovieName, ReleaseYear
Because you are using a join here, this gives SQL Server the flexibility of figuring out the most efficient way of joining the data from #Movies
and #MovieDescriptions
. Depending on your indexes, filters, number of rows, etc. it may decide to do a hash join, maybe it will use nested loops, etc. The point is that SQL Server has more options, and can now figure out the best way of reducing the number data block reads from your 2 tables (and indexes).
EDIT: I should also add that the above assumes that you are fetching every row returned from the query. If the query returns thousands of rows, but you only fetch the first 10 rows, then, in certain cases, the first query could actually outperform the second. That's because the subqueries will only execute on the rows as they get selected or fetched. If you never fetch some of the rows, you may never incur the cost of executing the subqueries on those unfetched rows. Something to consider.
Upvotes: 1
Reputation: 4175
There are many ways to get results from multiple tables like nested/sub queries, joins, aggrgate functions etc.
But Joins are way more efficient than subqueries when you need results from n number of tables.
It might not be observable for few tables. But when you work with 5 or 6 tables simultaneously, you will observer how joins perform.
Upvotes: 0