BurnsBA
BurnsBA

Reputation: 4929

Group by absorb NULL unless it's the only value

I'm trying to group by a primary column and a secondary column. I want to ignore NULL in the secondary column unless it's the only value.

CREATE TABLE #tempx1 ( Id INT, [Foo] VARCHAR(10), OtherKeyId INT );
INSERT INTO #tempx1 ([Id],[Foo],[OtherKeyId]) VALUES
(1, 'A', NULL),
(2, 'B', NULL),
(3, 'B', 1),
(4, 'C', NULL),
(5, 'C', 1),
(6, 'C', 2);

I'm trying to get output like

Foo OtherKeyId
A   NULL
B   1
C   1
C   2

This question is similar, but takes the MAX of the column I want, so it ignores other non-NULL values and won't work.

I tried to work out something based on this question, but I don't quite understand what that query does and can't get my output to work

-- Doesn't include Foo='A', creates duplicates for 'B' and 'C'
WITH cte AS (
    SELECT *,
    ROW_NUMBER() OVER (PARTITION BY [Foo] ORDER BY [OtherKeyId]) rn1
    FROM #tempx1
)
SELECT c1.[Foo], c1.[OtherKeyId], c1.rn1
FROM cte c1
INNER JOIN cte c2 ON c2.[OtherKeyId] = c1.[OtherKeyId] AND c2.rn1 = c1.rn1

This is for a modern SQL Server: Microsoft SQL Server 2019

Upvotes: 1

Views: 60

Answers (3)

BurnsBA
BurnsBA

Reputation: 4929

My actual problem is a bit more complicated than presented here, I ended up using the idea from Barbaros Özhan solution to count the number of items. This ends up with two inner queries on the data set with two different GROUP BY. I'm able to get the results I need on my real dataset using a query like the following:

SELECT
    a.[Foo],
    b.[OtherKeyId]
FROM (
    SELECT
        [Foo],
        COUNT([OtherKeyId]) [C]
    FROM #tempx1 t
    GROUP BY [Foo]
) a
JOIN (
    SELECT
        [Foo],
        [OtherKeyId]
    FROM #tempx1 t
    GROUP BY [Foo], [OtherKeyId]
) b ON b.[Foo] = a.[Foo]
WHERE
    (b.[OtherKeyId] IS NULL AND a.[C] = 0)
    OR (b.[OtherKeyId] IS NOT NULL AND a.[C] > 0)

Upvotes: 0

Barbaros Özhan
Barbaros Özhan

Reputation: 65138

You can use a GROUP BY expression with HAVING clause like below one

SELECT [Foo],[OtherKeyId]
  FROM #tempx1 t
 GROUP BY [Foo],[OtherKeyId]
HAVING SUM(CASE WHEN [OtherKeyId] IS NULL THEN 0 END) IS NULL
    OR ( SELECT COUNT(*) FROM #tempx1 WHERE [Foo] = t.[Foo] ) = 1

Demo

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269503

Hmmm . . . I think you want filtering:

select t.*
from #tempx1 t
where t.otherkeyid is not null or
      not exists (select 1
                  from #tempx1 t2
                  where t2.foo = t.foo and t2.otherkeyid is not null
                 );

Upvotes: 1

Related Questions