Nick Knauer
Nick Knauer

Reputation: 4243

Count by Group with condition

I have a dataset like below:

  Col_A    Col_B    Col_C    Col_D     Metric_1
10/1/19        A     NULL        Z            5
10/1/19        A       AB        Z           10
10/1/19        A                 Z           12
10/1/19        B       AC        J            7
10/1/19        B       AB        J           12
10/1/19        B                 J           13

How do I get the count of Col_C where it is not NULL or Blank ''? My dataset contains user level data and I want to pretty much get the count where NULL or Blank is not included.

  Col_A    Col_B    Col_C    Col_D     Metric_1     COUNT_COL_C
10/1/19        A     NULL        Z            5               1 
10/1/19        A       AB        Z           10               1
10/1/19        A                 Z           12               1 
10/1/19        B       AC        J            7               2
10/1/19        B       AB        J           12               2 
10/1/19        B                 J           13               2

My attempt looks like this, but it is saying the result is 3 for both instead of 1 and 2:

Dataset_Result AS (SELECT *, 
CASE WHEN COL_C IS NOT NULL AND COL_C != '' THEN COUNT(*) OVER (PARTITION BY COL_A,COL_B, COL_D) END 
COUNT_COL_C
FROM ORIGINAL_DATASET)

SELECT * FROM Dataset_Result;

Upvotes: 0

Views: 40

Answers (2)

jpw
jpw

Reputation: 44871

You can move the conditional inside the window function:

;WITH Dataset_Result AS (
  SELECT *, 
  SUM(CASE WHEN COL_C IS NOT NULL AND COL_C != '' THEN 1 ELSE 0 END) OVER (PARTITION BY COL_A,COL_B, COL_D) 
COUNT_COL_C
FROM ORIGINAL_DATASET)

SELECT * FROM Dataset_Result;

See http://www.sqlfiddle.com/#!18/cb8e1/2 for an example.

Upvotes: 1

GenericDisplayName
GenericDisplayName

Reputation: 463

Just doing a normal count with a group by and no CTE or window functions should work for what you say you need.

    SELECT Col_A, Col_B, Col_C,Col_D, Metric_1, COUNT(*) 
    FROM ORIGINAL_DATASET
    where COL_C IS NOT NULL AND COL_C != ''
    group by Col_A, Col_B, Col_C,Col_D, Metric_1
UNION 
SELECT Col_A, Col_B, Col_C,Col_D, Metric_1, COUNT(*) 
    FROM ORIGINAL_DATASET
    where COL_C IS NULL AND COL_C = ''
    group by Col_A, Col_B, Col_C,Col_D, Metric_1

EDIT1: Better approach

SELECT  
COUNT(CASE WHEN COL_C IS NOT NULL AND COL_C != '' THEN 1 ELSE NULL END) AS COUNT_COL_C
FROM ORIGINAL_DATASET

Upvotes: 1

Related Questions