Reputation: 13
I'm trying to run a code that will summarise a list of clients based on age categories. I have the client ID and their ages, and am using a CASE WHEN to group the ages into buckets, then trying to count the client ids in each of those buckets.
For info - the Assisted Date field is just the date on which the client was served and this is just included in my testing to keep the results small - so I'm just focusing on Feb 2019 services that aren't Assessments; and the SIR ID is the location at which the service was provided.
select distinct
CASE
when CD.Age between 0 and 5 then '0-5'
when CD.Age between 6 and 11 then '6-11'
when CD.Age between 12 and 14 then '12-14'
when CD.Age between 15 and 17 then '15-17'
when CD.Age between 18 and 24 then '18-24'
when CD.Age between 25 and 54 then '25-54'
when CD.Age between 55 and 64 then '55-64'
when CD.Age > 65 then '65+'
else 'Unknown'
END AS Age_Group,
count(distinct AP.Source_Individual_ID) as "Count"
from dm.Assistance_Provided AP, rpt.ClientsDemographics CD
where CD.Source_Individual_ID = AP.Source_Individual_ID
and AP.SIR_ID = '1909'
and AP.Service_Provided <> 'Assessment'
and year(AP.Assisted_Date) = '2019'
and month(AP.Assisted_Date) = 2
group by CD.Age
If I run the code excluding the count(distinct) I end up with 17 rows of data. The 17 clients are all aged between 18-24 so the Age-Group for all clients come back as 18-24. So the Case When appears to be operating correctly.
However, if I run the code with the count(distinct), I end up with 3 rows of data. the Age Group 18-24 is listed 3 times and the Count as 2, 4, and 5 respectively for the 3 lines. I definitely have 17 different IDs in the list, but for some reason this is being reduced to a total of 11 when I add the count(distinct).
I've also tried removing the distinct from the select i.e. so
select
CASE
when CD.Age between ....
If I do that I get 5 rows - still all with age group 18-24, but then with 2,4,5,4,2 respectively - so then back to my total of 17. But I'm not sure why I'm not just getting one line back : Age Group: 18-24; Count: 17
What am I doing wrong?
Upvotes: 0
Views: 824
Reputation: 1346
You need to put your case statement in your group by
. As it is, since you group by CD.Age
it will write a single row for every distinct age. In other words if you have two rows with Age = 12
and one row with Age = 13
you'll get two rows, both with 12-14
as their age group, but with a count of 2
for the first and 1
for the second. To make it more confusing, if you only had one row each, then because both rows are the same and you have the distinct
clause it would only return 1 row with a count of 1.
If you put your case statement into the group by, then it will group by each distinct Age_Group
instead. i.e., it does the case transformation first, then groups by it.
So try:
select
CASE
when CD.Age between 0 and 5 then '0-5'
when CD.Age between 6 and 11 then '6-11'
when CD.Age between 12 and 14 then '12-14'
when CD.Age between 15 and 17 then '15-17'
when CD.Age between 18 and 24 then '18-24'
when CD.Age between 25 and 54 then '25-54'
when CD.Age between 55 and 64 then '55-64'
when CD.Age > 65 then '65+'
else 'Unknown'
END AS Age_Group,
count(distinct AP.Source_Individual_ID) as "Number of Clients Assisted"
from dm.Assistance_Provided AP, rpt.ClientsDemographics CD
where CD.Source_Individual_ID = AP.Source_Individual_ID
and AP.SIR_ID = '1909'
and AP.Service_Provided <> 'Assessment'
and year(AP.Assisted_Date) = '2019'
and month(AP.Assisted_Date) = 2
group by CASE
when CD.Age between 0 and 5 then '0-5'
when CD.Age between 6 and 11 then '6-11'
when CD.Age between 12 and 14 then '12-14'
when CD.Age between 15 and 17 then '15-17'
when CD.Age between 18 and 24 then '18-24'
when CD.Age between 25 and 54 then '25-54'
when CD.Age between 55 and 64 then '55-64'
when CD.Age > 65 then '65+'
else 'Unknown'
END
Upvotes: 1
Reputation: 521093
I suspect that you should actually be aggregating by Source_Individual
, and then taking conditional counts everywhere:
SELECT
CD.Source_Individual_ID,
COUNT(CASE WHEN CD.Age BETWEEN 0 AND 5 THEN 1 END) AS [0-5],
COUNT(CASE WHEN CD.Age BETWEEN 6 AND 11 THEN 1 END) AS [6-11],
COUNT(CASE WHEN CD.Age BETWEEN 12 AND 14 THEN 1 END) AS [12-14],
COUNT(CASE WHEN CD.Age BETWEEN 15 AND 17 THEN 1 END) AS [15-17],
COUNT(CASE WHEN CD.Age BETWEEN 18 AND 24 THEN 1 END) AS [18-24],
COUNT(CASE WHEN CD.Age BETWEEN 25 AND 54 THEN 1 END) AS [25-54],
COUNT(CASE WHEN CD.Age BETWEEN 55 AND 64 THEN 1 END) AS [55-64]
COUNT(CASE WHEN CD.Age > 65 THEN 1 END) AS [65+],
COUNT(*) AS [Number of Clients Assisted]
FROM dm.Assistance_Provided AP
INNER JOIN rpt.ClientsDemographics CD
ON CD.Source_Individual_ID = AP.Source_Individual_ID
WHERE
AP.SIR_ID = '1909' AND
AP.Service_Provided <> 'Assessment' AND
YEAR(AP.Assisted_Date) = 2019 AND
MONTH(AP.Assisted_Date) = 2
GROUP BY
CD.Source_Individual_ID;
Note that I have rewritten your query to use explicit, modern, inner joins, rather than the implicit joins you were using.
Upvotes: 1