J O'Donnell
J O'Donnell

Reputation: 1

Unexpected number of rows returned in case when query

I'm simply trying to profile a table, using a basic case when statement, to count the number of customers who fall into a band, and further split by another column.

My SQL Server query is returning multiple rows of the same definition however. I would expect 4 rows for the following 2 bands bands, but instead I'm getting multiple rows for all the risk band definitions.

Example of returning multiple rows for A1a and A1b, when would expect 4 rows for this - count of id, risk band (20 variables) and indicator (Y/N):

(No column name) SRIRiskBands HighlyLikelyIndicator
31423 A1b N
4702 A1b Y
29427 A1b N
4788 A1b Y
27854 A1b N
4543 A1b Y
4349 A1b Y
26029 A1b N
4291 A1a Y
25459 A1a N
24352 A1a N
4100 A1a Y
3875 A1a Y
22689 A1a N
4130 A1a Y
22454 A1a N
20619 A1a N

Code used:

select count(distinct involvedPartyId),
case when MachineLearningScores >= 934.50 THEN 'A1a'
when MachineLearningScores >= 920.50 and MachineLearningScores <= 934.49 THEN 'A1b'
when MachineLearningScores >= 907.50 and MachineLearningScores <= 920.49 THEN 'A1c'
when MachineLearningScores >= 895.50 and MachineLearningScores <= 907.49 THEN 'A1d'
when MachineLearningScores >= 882.50 and MachineLearningScores <= 895.49 THEN 'A1e'
when MachineLearningScores >= 869.50 and MachineLearningScores <= 882.49 THEN 'A1f'
when MachineLearningScores >= 859.50 and MachineLearningScores <= 869.49 THEN 'A1g'
when MachineLearningScores >= 848.50 and MachineLearningScores <= 859.49 THEN 'A1h'
when MachineLearningScores >= 839.50 and MachineLearningScores <= 848.49 THEN 'A1i'
when MachineLearningScores >= 830.50 and MachineLearningScores <= 839.49 THEN 'A2'
when MachineLearningScores >= 809.50 and MachineLearningScores <= 830.49 THEN 'A3'
when MachineLearningScores >= 796.50 and MachineLearningScores <= 809.49 THEN 'A4'
when MachineLearningScores >= 789.50 and MachineLearningScores <= 796.49 THEN 'A5'
when MachineLearningScores >= 767.50 and MachineLearningScores <= 789.49 THEN 'B1'
when MachineLearningScores >= 752.50 and MachineLearningScores <= 767.49 THEN 'B2'
when MachineLearningScores >= 733.50 and MachineLearningScores <= 752.49 THEN 'B3'
when MachineLearningScores >= 724.50 and MachineLearningScores <= 733.49 THEN 'C1'
when MachineLearningScores >= 685.50 and MachineLearningScores <= 724.49 THEN 'D1'
when MachineLearningScores >= 649.50 and MachineLearningScores <= 685.49 THEN 'E1'
when MachineLearningScores < 649.49 THEN 'F1'
ELSE 'No MachineLearningScores'
    END AS 'SRIRiskBand', HighlyLikelyIndicator
from MarketingAnalytics.dbo.Ri_PA_Personal
where batchid = (SELECT MAX(BatchId) FROM MarketingAnalytics.dbo.Ri_PA_Personal)
group by MachineLearningScores, HighlyLikelyIndicator
order by MachineLearningScores asc

Any help would be greatly appreciated.

Upvotes: 0

Views: 61

Answers (2)

SQLpro
SQLpro

Reputation: 5169

Something like :

WITH 
vals AS 
(SELECT * 
 FROM   (VALUES (934.50, 'A1a'),
                (920.50, 'A1b'),
                (907.50, 'A1c'),
                (895.50, 'A1d'),
                (882.50, 'A1e'),
                (869.50, 'A1f'),
                (859.50, 'A1g'),
                (848.50, 'A1h'),
                (839.50, 'A1i'),
                (830.50, 'A2'),
                (809.50, 'A3'),
                (796.50, 'A4'),
                (789.50, 'A5'),
                (767.50, 'B1'),
                (752.50, 'B2'),
                (733.50, 'B3'),
                (724.50, 'C1'),
                (685.50, 'D1'),
                (649.50, 'E1'),
                (649.49, 'F1')
        ) AS VR (val,    ref ) 
)
SELECT count(distinct involvedPartyId) AS CNT, 
       CASE 
          WHEN LEAD(val) OVER(ORDER BY val) IS NULL THEN 'F1' 
          WHEN LAG(val) OVER(ORDER BY val) IS NULL THEN 'A1a' 
          ELSE ref
       END AS SRIRiskBandLAG
FROM   MarketingAnalytics.dbo.Ri_PA_Personal AS RPP
       LEFT OUTER JOIN vals AS v 
          ON MachineLearningScores < v.val 
             AND >= MachineLearningScores >= LAG(val) OVER(ORDER BY val)
where batchid = (SELECT MAX(BatchId) FROM MarketingAnalytics.dbo.Ri_PA_Personal)
group by val, ref, HighlyLikelyIndicator
order by val asc

Upvotes: 0

This result happens because the "group by" clause is using the original field without treating it. You can solve this by using the "group by" clause by copying the "case when" clause or by using a subselect to group

Solution 1

select count(distinct involvedPartyId),
case when MachineLearningScores >= 934.50 THEN 'A1a'
when MachineLearningScores >= 920.50 and MachineLearningScores <= 934.49 THEN 'A1b'
when MachineLearningScores >= 907.50 and MachineLearningScores <= 920.49 THEN 'A1c'
when MachineLearningScores >= 895.50 and MachineLearningScores <= 907.49 THEN 'A1d'
when MachineLearningScores >= 882.50 and MachineLearningScores <= 895.49 THEN 'A1e'
when MachineLearningScores >= 869.50 and MachineLearningScores <= 882.49 THEN 'A1f'
when MachineLearningScores >= 859.50 and MachineLearningScores <= 869.49 THEN 'A1g'
when MachineLearningScores >= 848.50 and MachineLearningScores <= 859.49 THEN 'A1h'
when MachineLearningScores >= 839.50 and MachineLearningScores <= 848.49 THEN 'A1i'
when MachineLearningScores >= 830.50 and MachineLearningScores <= 839.49 THEN 'A2'
when MachineLearningScores >= 809.50 and MachineLearningScores <= 830.49 THEN 'A3'
when MachineLearningScores >= 796.50 and MachineLearningScores <= 809.49 THEN 'A4'
when MachineLearningScores >= 789.50 and MachineLearningScores <= 796.49 THEN 'A5'
when MachineLearningScores >= 767.50 and MachineLearningScores <= 789.49 THEN 'B1'
when MachineLearningScores >= 752.50 and MachineLearningScores <= 767.49 THEN 'B2'
when MachineLearningScores >= 733.50 and MachineLearningScores <= 752.49 THEN 'B3'
when MachineLearningScores >= 724.50 and MachineLearningScores <= 733.49 THEN 'C1'
when MachineLearningScores >= 685.50 and MachineLearningScores <= 724.49 THEN 'D1'
when MachineLearningScores >= 649.50 and MachineLearningScores <= 685.49 THEN 'E1'
when MachineLearningScores < 649.49 THEN 'F1'
ELSE 'No MachineLearningScores'
    END AS 'SRIRiskBand', HighlyLikelyIndicator
from MarketingAnalytics.dbo.Ri_PA_Personal
where batchid = (SELECT MAX(BatchId) FROM MarketingAnalytics.dbo.Ri_PA_Personal)
group by case when MachineLearningScores >= 934.50 THEN 'A1a'
when MachineLearningScores >= 920.50 and MachineLearningScores <= 934.49 THEN 'A1b'
when MachineLearningScores >= 907.50 and MachineLearningScores <= 920.49 THEN 'A1c'
when MachineLearningScores >= 895.50 and MachineLearningScores <= 907.49 THEN 'A1d'
when MachineLearningScores >= 882.50 and MachineLearningScores <= 895.49 THEN 'A1e'
when MachineLearningScores >= 869.50 and MachineLearningScores <= 882.49 THEN 'A1f'
when MachineLearningScores >= 859.50 and MachineLearningScores <= 869.49 THEN 'A1g'
when MachineLearningScores >= 848.50 and MachineLearningScores <= 859.49 THEN 'A1h'
when MachineLearningScores >= 839.50 and MachineLearningScores <= 848.49 THEN 'A1i'
when MachineLearningScores >= 830.50 and MachineLearningScores <= 839.49 THEN 'A2'
when MachineLearningScores >= 809.50 and MachineLearningScores <= 830.49 THEN 'A3'
when MachineLearningScores >= 796.50 and MachineLearningScores <= 809.49 THEN 'A4'
when MachineLearningScores >= 789.50 and MachineLearningScores <= 796.49 THEN 'A5'
when MachineLearningScores >= 767.50 and MachineLearningScores <= 789.49 THEN 'B1'
when MachineLearningScores >= 752.50 and MachineLearningScores <= 767.49 THEN 'B2'
when MachineLearningScores >= 733.50 and MachineLearningScores <= 752.49 THEN 'B3'
when MachineLearningScores >= 724.50 and MachineLearningScores <= 733.49 THEN 'C1'
when MachineLearningScores >= 685.50 and MachineLearningScores <= 724.49 THEN 'D1'
when MachineLearningScores >= 649.50 and MachineLearningScores <= 685.49 THEN 'E1'
when MachineLearningScores < 649.49 THEN 'F1'
ELSE 'No MachineLearningScores'
    END AS 'SRIRiskBand', HighlyLikelyIndicator
order by case when MachineLearningScores >= 934.50 THEN 'A1a'
when MachineLearningScores >= 920.50 and MachineLearningScores <= 934.49 THEN 'A1b'
when MachineLearningScores >= 907.50 and MachineLearningScores <= 920.49 THEN 'A1c'
when MachineLearningScores >= 895.50 and MachineLearningScores <= 907.49 THEN 'A1d'
when MachineLearningScores >= 882.50 and MachineLearningScores <= 895.49 THEN 'A1e'
when MachineLearningScores >= 869.50 and MachineLearningScores <= 882.49 THEN 'A1f'
when MachineLearningScores >= 859.50 and MachineLearningScores <= 869.49 THEN 'A1g'
when MachineLearningScores >= 848.50 and MachineLearningScores <= 859.49 THEN 'A1h'
when MachineLearningScores >= 839.50 and MachineLearningScores <= 848.49 THEN 'A1i'
when MachineLearningScores >= 830.50 and MachineLearningScores <= 839.49 THEN 'A2'
when MachineLearningScores >= 809.50 and MachineLearningScores <= 830.49 THEN 'A3'
when MachineLearningScores >= 796.50 and MachineLearningScores <= 809.49 THEN 'A4'
when MachineLearningScores >= 789.50 and MachineLearningScores <= 796.49 THEN 'A5'
when MachineLearningScores >= 767.50 and MachineLearningScores <= 789.49 THEN 'B1'
when MachineLearningScores >= 752.50 and MachineLearningScores <= 767.49 THEN 'B2'
when MachineLearningScores >= 733.50 and MachineLearningScores <= 752.49 THEN 'B3'
when MachineLearningScores >= 724.50 and MachineLearningScores <= 733.49 THEN 'C1'
when MachineLearningScores >= 685.50 and MachineLearningScores <= 724.49 THEN 'D1'
when MachineLearningScores >= 649.50 and MachineLearningScores <= 685.49 THEN 'E1'
when MachineLearningScores < 649.49 THEN 'F1'
ELSE 'No MachineLearningScores'
    END AS 'SRIRiskBand' asc

Solution 2

select count(involvedPartyId),
    SRIRiskBand,
    HighlyLikelyIndicator
from
(
    select involvedPartyId,
    case when MachineLearningScores >= 934.50 THEN 'A1a'
    when MachineLearningScores >= 920.50 and MachineLearningScores <= 934.49 THEN 'A1b'
    when MachineLearningScores >= 907.50 and MachineLearningScores <= 920.49 THEN 'A1c'
    when MachineLearningScores >= 895.50 and MachineLearningScores <= 907.49 THEN 'A1d'
    when MachineLearningScores >= 882.50 and MachineLearningScores <= 895.49 THEN 'A1e'
    when MachineLearningScores >= 869.50 and MachineLearningScores <= 882.49 THEN 'A1f'
    when MachineLearningScores >= 859.50 and MachineLearningScores <= 869.49 THEN 'A1g'
    when MachineLearningScores >= 848.50 and MachineLearningScores <= 859.49 THEN 'A1h'
    when MachineLearningScores >= 839.50 and MachineLearningScores <= 848.49 THEN 'A1i'
    when MachineLearningScores >= 830.50 and MachineLearningScores <= 839.49 THEN 'A2'
    when MachineLearningScores >= 809.50 and MachineLearningScores <= 830.49 THEN 'A3'
    when MachineLearningScores >= 796.50 and MachineLearningScores <= 809.49 THEN 'A4'
    when MachineLearningScores >= 789.50 and MachineLearningScores <= 796.49 THEN 'A5'
    when MachineLearningScores >= 767.50 and MachineLearningScores <= 789.49 THEN 'B1'
    when MachineLearningScores >= 752.50 and MachineLearningScores <= 767.49 THEN 'B2'
    when MachineLearningScores >= 733.50 and MachineLearningScores <= 752.49 THEN 'B3'
    when MachineLearningScores >= 724.50 and MachineLearningScores <= 733.49 THEN 'C1'
    when MachineLearningScores >= 685.50 and MachineLearningScores <= 724.49 THEN 'D1'
    when MachineLearningScores >= 649.50 and MachineLearningScores <= 685.49 THEN 'E1'
    when MachineLearningScores < 649.49 THEN 'F1'
    ELSE 'No MachineLearningScores'
        END AS 'SRIRiskBand', HighlyLikelyIndicator
    from MarketingAnalytics.dbo.Ri_PA_Personal
    where batchid = (SELECT MAX(BatchId) FROM MarketingAnalytics.dbo.Ri_PA_Personal
) AS temp
group by SRIRiskBand,
    HighlyLikelyIndicator

Upvotes: 0

Related Questions