Reputation: 171
I have a data set
Outlet Period Brand Sales
A Jan XX 12
A Jan XY 13
A FEB AB 10
B JAN AC 19
I would like to count the number of distinct brands for each period within each period, but excluding a brand 'CD'. I would like to inquire does the count distinct not work in a single line as shown below, but would only work on example 2?
Example 1 (Brand CD is counted even though it should not be)
PROC SQL;
CREATE TABLE Test AS
SELECT
Outlet, Period, Brand,
case when Brand not in ('CD') then count (distinct Brand) end as k_Brands_Players2
group by period, outlet;
quit;
Example 2 (Brand CD is correctly not counted)
PROC SQL;
CREATE TABLE Test AS
SELECT
Outlet, Period, Brand,
case when Brand not in ('CD') then Brand else ' ' end as Brand_Players,
count(distinct calculated Brand_Players) as k_Brands_Players
group by period, outlet;
quit;
The expected output is:
Outlet Period Brand k_Brands_Players
A Jan XX 2
A Jan XY 2
A Feb AS 3
A FEB QW 3
A Feb XY 3
B Jan KW 1
....
Upvotes: 2
Views: 5019
Reputation: 51566
The problem in your first query is that you are using the COUNT() aggregate function at the wrong place. You have
case when Brand not in ('CD') then count(distinct Brand) end
So when BRAND is equal to 'CD' you will get a missing value and otherwise you will get the number of distinct Brands, including the 'CD' Brand.
If instead you use this construction:
count(distinct case when Brand not in ('CD') then Brand end)
then the COUNT() function will see the 'CD' value as a missing value and not count it.
Upvotes: 2
Reputation: 1235
Try this:-
/*Count distinct will come outside the case when statement*/
PROC SQL;
CREATE TABLE Test AS
SELECT distinct Outlet, Period, Brand, k_Brands_Players
from
YOUR_DATASET_NAME a
LEFT JOIN
(
SELECT
Outlet, Period,
count(distinct(case when Brand not in ('CD') then Brand end)) as k_Brands_Players
FROM YOUR_DATASET_NAME
group by 1,2
) b
on a.Outlet=b.Outlet and a.Period=b.Period;
quit;
Let me know if you have any questions
Upvotes: 2