user90831
user90831

Reputation: 171

count distinct in SAS SQL case when

I have a data set

Outlet    Period     Brand    Sales
A          Jan         XX       12
A          Jan         XY       13
A          FEB         AB       10
B          JAN         AC       19

I would like to count the number of distinct brands for each period within each period, but excluding a brand 'CD'. I would like to inquire does the count distinct not work in a single line as shown below, but would only work on example 2?

Example 1 (Brand CD is counted even though it should not be)

PROC SQL;
      CREATE TABLE Test AS 
            SELECT      
                  Outlet, Period, Brand,

              case when Brand not in ('CD') then count (distinct Brand)  end as k_Brands_Players2

       group by period, outlet;
quit;

Example 2 (Brand CD is correctly not counted)

PROC SQL;
      CREATE TABLE Test AS 
            SELECT      
                  Outlet, Period, Brand,

              case when Brand not in ('CD') then Brand else ' ' end as Brand_Players, 
              count(distinct calculated Brand_Players) as k_Brands_Players


       group by period, outlet;
quit;

The expected output is:

Outlet     Period     Brand     k_Brands_Players
  A          Jan        XX            2
  A          Jan        XY            2
  A          Feb        AS            3
  A          FEB        QW            3
  A          Feb        XY            3
  B          Jan        KW            1
  ....

Upvotes: 2

Views: 5019

Answers (2)

Tom
Tom

Reputation: 51566

The problem in your first query is that you are using the COUNT() aggregate function at the wrong place. You have

case when Brand not in ('CD') then count(distinct Brand) end

So when BRAND is equal to 'CD' you will get a missing value and otherwise you will get the number of distinct Brands, including the 'CD' Brand.

If instead you use this construction:

count(distinct case when Brand not in ('CD') then Brand end)

then the COUNT() function will see the 'CD' value as a missing value and not count it.

Upvotes: 2

India.Rocket
India.Rocket

Reputation: 1235

Try this:-

/*Count distinct will come outside the case when statement*/

PROC SQL;
  CREATE TABLE Test AS 
  SELECT distinct Outlet, Period, Brand, k_Brands_Players
  from
  YOUR_DATASET_NAME a
  LEFT JOIN
  (
        SELECT      
              Outlet, Period, 

          count(distinct(case when Brand not in ('CD') then Brand end)) as k_Brands_Players

        FROM YOUR_DATASET_NAME
   group by 1,2
  ) b
  on a.Outlet=b.Outlet and a.Period=b.Period;
quit;

Let me know if you have any questions

Upvotes: 2

Related Questions