Reputation: 2752
I have a select (MS SQL) where I'm joining a parent with up to 3 children. I'd like to select all the children if ANY of the children meets a criteria (Type <> 'Done'). Performance is very important.
I've tried
SELECT p.*,c.* FROM Parent p
INNER JOIN Child c ON p.Id=c.ParentId
WHERE p.Id IN (SELECT DISTINCT c.ParentId FROM Child c2 WHERE c2.ParentId=p.Id AND c2.Type<>'Done')
But the inner select often returns 2000+ children so the IN gives bad performance. I've also tried EXISTS and count:
SELECT p.*,c.* FROM Parent p
INNER JOIN Child c ON p.Id=c.ParentId
WHERE EXISTS(SELECT 1 FROM Child c2 WHERE c2.ParentId=p.Id AND c2.Type<>'Done')
SELECT p.*,c.* FROM Parent p
INNER JOIN Child c ON p.Id=c.ParentId
WHERE 0 < (SELECT COUNT(c2.Id) FROM Child c2 WHERE c2.ParentId=p.Id AND c2.Type<>'Done')
All these return the correct result but with bad performance. I think I have all the required indexes.
So does anyone have any alternatives?
Upvotes: 0
Views: 1210
Reputation: 350
You can also use a CTE:
;WITH Parents
AS
(
SELECT
p.*
FROM
[Parent] AS p
WHERE
EXISTS
(
SELECT
1
FROM
[Child] AS c
WHERE
p.ID = c.ParentID
AND c.[Type] <> 'Done'
)
)
-- Select all children and siblings...
SELECT
p.ID AS [ParentID]
,p.[Name]
,c.ID AS [ChildID]
,c.[Type]
FROM
Parents AS p
INNER JOIN [Child] AS c
ON p.ID = c.ParentID
The query plan is pretty small and it might perform well for you.
Upvotes: 0
Reputation: 1866
New Answer:
Well then I would create a temporary table with the queries below and use them instead of your nested subqueries and try which combination is the fastest.
Update:
DECLARE @tmp TABLE(pid INT);
INSERT INTO @tmp(pid)
SELECT p.Id FROM Parent p
WHERE EXISTS (SELECT 1 FROM Child c WHERE c.ParentId=p.Id AND c2.Type<>'Done');
SELECT c.*, p.*
FROM @tmp t INNER JOIN Child c ON t.pid = c.ParentID INNER JOIN Parent p ON t.pid = p.Id
You can also declare @tmp the same way you declared Parent and fill in the whole table. This way you could avoid the JOIN with Parent. But without profiling all possible solutions it is not possible to say which one will be the fastest.
Old Answer:
I think that you don't need to join with the Child table. That leaves you with:
SELECT p.Id FROM Parent p
WHERE EXISTS (SELECT 1 FROM Child c2 WHERE c2.Type<>'Done' AND c2.Id = p.Id)
You could also try
SELECT p.Id FROM Parent p
INNER JOIN Child c ON p.Id=c.ParentId
WHERE c.Type <> 'Done'
GROUP BY p.Id
Or
SELECT DISTINCT p.Id FROM Parent p
INNER JOIN Child c ON p.Id=c.ParentId
WHERE c.Type <> 'Done'
These should all be equivalent.
Upvotes: 1
Reputation: 16641
The EXISTS variant should normally be the fastest one.
Assuming you do have all the relevent indices, there is really no way to optimize this query, other than preparing your data in temp tables. You could make a denormalized table that combines the parent and child table. This should be a massive performance increase.
This is an especially feasible solution if you never delete or update existing parent records, but only add new ones. Because in this case, you won't suffer from update anomalies.
If you do delete or update existing parent records, this would make the process of keeping your denormalized table up to date much more complex.
Upvotes: 0