Reputation: 2752

Select all children if any one meets criteria

I have a select (MS SQL) where I'm joining a parent with up to 3 children. I'd like to select all the children if ANY of the children meets a criteria (Type <> 'Done'). Performance is very important.

I've tried

SELECT p.*,c.* FROM Parent p
INNER JOIN Child c ON p.Id=c.ParentId
WHERE p.Id IN (SELECT DISTINCT c.ParentId FROM Child c2 WHERE c2.ParentId=p.Id AND c2.Type<>'Done')

But the inner select often returns 2000+ children so the IN gives bad performance. I've also tried EXISTS and count:

SELECT p.*,c.* FROM Parent p
INNER JOIN Child c ON p.Id=c.ParentId
WHERE EXISTS(SELECT 1 FROM Child c2 WHERE c2.ParentId=p.Id AND c2.Type<>'Done')

SELECT p.*,c.* FROM Parent p
INNER JOIN Child c ON p.Id=c.ParentId
WHERE 0 < (SELECT COUNT(c2.Id) FROM Child c2 WHERE c2.ParentId=p.Id AND c2.Type<>'Done')

All these return the correct result but with bad performance. I think I have all the required indexes.

So does anyone have any alternatives?

Upvotes: 0

Answers (3)

John Hurrell

Reputation: 350

You can also use a CTE:

;WITH Parents
AS
(
    SELECT
        p.*
    FROM
        [Parent] AS p
    WHERE
        EXISTS
        (
            SELECT
                1
            FROM
                [Child] AS c
            WHERE
                p.ID = c.ParentID
                AND c.[Type] <> 'Done'
        )
)

-- Select all children and siblings...
SELECT
    p.ID AS [ParentID]
    ,p.[Name]
    ,c.ID AS [ChildID]
    ,c.[Type]
FROM
    Parents AS p
    INNER JOIN [Child] AS c
        ON p.ID = c.ParentID

The query plan is pretty small and it might perform well for you.

Upvotes: 0

Karsten

Reputation: 1866

New Answer:

Well then I would create a temporary table with the queries below and use them instead of your nested subqueries and try which combination is the fastest.

Update:

DECLARE @tmp TABLE(pid INT);

INSERT INTO @tmp(pid)
SELECT p.Id FROM Parent p
WHERE EXISTS (SELECT 1 FROM Child c WHERE c.ParentId=p.Id AND c2.Type<>'Done');

SELECT c.*, p.*
FROM @tmp t INNER JOIN Child c ON t.pid = c.ParentID INNER JOIN Parent p ON t.pid = p.Id

You can also declare @tmp the same way you declared Parent and fill in the whole table. This way you could avoid the JOIN with Parent. But without profiling all possible solutions it is not possible to say which one will be the fastest.

Old Answer:

I think that you don't need to join with the Child table. That leaves you with:

SELECT p.Id FROM Parent p
WHERE EXISTS (SELECT 1 FROM Child c2 WHERE c2.Type<>'Done' AND c2.Id = p.Id)

You could also try

SELECT p.Id FROM Parent p
INNER JOIN Child c ON p.Id=c.ParentId
WHERE c.Type <> 'Done'
GROUP BY p.Id

SELECT DISTINCT p.Id FROM Parent p
INNER JOIN Child c ON p.Id=c.ParentId
WHERE c.Type <> 'Done'

These should all be equivalent.

Upvotes: 1

wvdz

Reputation: 16641

The EXISTS variant should normally be the fastest one.

Assuming you do have all the relevent indices, there is really no way to optimize this query, other than preparing your data in temp tables. You could make a denormalized table that combines the parent and child table. This should be a massive performance increase.

This is an especially feasible solution if you never delete or update existing parent records, but only add new ones. Because in this case, you won't suffer from update anomalies.

If you do delete or update existing parent records, this would make the process of keeping your denormalized table up to date much more complex.

Upvotes: 0

Select all children if any one meets criteria

Answers (3)

Related Questions