Reputation: 9468
I'm doing some performance testing and the StackOverflow2010 DB is great for this.
Looking here for testing multi-column joins SQL WHERE.. IN clause multiple columns
The StackOverflow DB is available here: https://www.brentozar.com/archive/2015/10/how-to-download-the-stack-overflow-database-via-bittorrent/
I'm trying to query for Comments for Posts for specific users:
-- get users id < 5
SELECT *
FROM [Users]
WHERE Id < 5;
-- get posts for those users
SELECT *
FROM [Posts]
WHERE [OwnerUserId] IN (SELECT [Id] FROM [Users] WHERE Id < 5);
-- get comments for these posts
SELECT *
FROM [Comments]
WHERE [PostId] IN (SELECT [Id]
FROM [Posts]
WHERE [OwnerUserId] IN (SELECT [Id] FROM [Users] WHERE Id < 5))
AND [UserId] IN (SELECT [OwnerUserId]
FROM [Posts]
WHERE [OwnerUserId] IN (SELECT [Id] FROM [Users] WHERE Id < 5));
-- THIS version seems to select ALL comments and takes forever!
SELECT *
FROM [Comments]
WHERE EXISTS (SELECT *
FROM [Posts]
WHERE [OwnerUserId] IN (SELECT [Id] FROM [Users] WHERE Id < 5));
The IN
query works but it selects some Comments that the OwnerUserId
didn't make (orphans).
The EXISTS
query (which seems to be the correct way) seems to select ALL comments and takes forever!
Any ideas? The reason for this is explained here: https://sproket.github.io/Persism/n+1.html
Upvotes: 0
Views: 70
Reputation: 96004
You are massively overly complicating the issue here.
Firstly, let's have a look at your second query. This doesn't actually need to reference the Users
table at all. You're looking for details of Posts where the User ID is < 5, and the User ID is already in the Posts
table, OwnerUserId
. So you could just write your second query as:
SELECT *
FROM dbo.Posts
WHERE OwnerUserId < 5;
For your next query, the same applies again, you don't need the subqueries, the ID of the user is already in the Comments
table:
SELECT *
FROM dbo.Comments
WHERE UserId < 5;
As for your EXISTS
the reason it isn't working is because it isn't correlated. As such the subquery in your EXISTS
always returns at least one row, and so the EXISTS
evaluates to TRUE; and so every row in the table Comments
is returned.
A correlated query would look something like this (this is not relevant to the queries you have, as they are overly complicated):
SELECT {Columns}
FROM dbo.SomeTable ST
WHERE EXISTS (SELECT 1
FROM dbo.OtherTable OT
WHERE OT.OtherInt > 10
AND OT.SomeID = ST.ID);
Note the clause OT.SomeID = ST.ID
which means that the tables dbo.SomeTable
and dbo.OtherTable
have some relationship. As such the above query would only return rows from dbo.SomeTable
where a related row existed in dbo.OtherTable
and that related row has a value greater than 10
in the column OtherInt
.
Upvotes: 2