Reputation:
I got the following Patients
table.
HospitalId INT,
GenderId BIT,
Age TINYINT,
DiseaseId SMALLINT
GenderId
= 0 is Male
GenderId
= 1 is Female
HospitalA has the HospitalId
0
HospitalB has the HospitalId
1
Here's the output I want to produce:
DiseaseId | HospitalA_Male_18-30 | HospitalA_Male_31-40 |
---------------------------------------------------------
0 | (count here) | (count here) |
1 | (count here) | (count here) |
2 | (count here) | (count here) |
3 | (count here) | (count here) |
(columns continued)
HospitalA_Female_18-30 | HospitalA_Female_31-40 |
-------------------------------------------------
(count here) | (count here) |
(count here) | (count here) |
(count here) | (count here) |
(count here) | (count here) |
(columns continued)
HospitalB_Male_18-30 | HospitalB_Male_31-40 |
---------------------------------------------
(count here) | (count here) |
(count here) | (count here) |
(count here) | (count here) |
(count here) | (count here) |
(columns continued)
HospitalB_Female_18-30 | HospitalB_Female_31-40 |
-------------------------------------------------
(count here) | (count here) |
(count here) | (count here) |
(count here) | (count here) |
(count here) | (count here) |
(9 columns in the result set)
So as you can see I actually need to count, for each disease, how many patients have the disease in each specific group (by hospital, by gender and by age category).
How can such grouping be done (most efficiently) in T-SQL?
Upvotes: 3
Views: 666
Reputation: 19356
You might do it using pivot query:
select * from
(
select diseaseid,
'Hospital'
+ case hospitalid when 0 then 'A' when 1 then 'B' end
+ '_'
+ case genderid when 1 then 'Female' else 'Male' end
+ '_'
+ case when age between 18 and 30
then '18-30'
else (case when age between 31 and 40 then '31-40' end)
end Title,
1 Cnt
from Patients
where age between 18 and 40
) t
pivot (
count (Cnt) for Title in (
[HospitalA_Male_18-30], [HospitalA_Male_31-40],
[HospitalA_Female_18-30], [HospitalA_Female_31-40],
[HospitalB_Male_18-30], [HospitalB_Male_31-40],
[HospitalB_Female_18-30], [HospitalB_Female_31-40]
)
) as Q
UPDATE
As a development of the above solution, you could also move the name parts from the CASE expressions to their own virtual tables and join the Patients
table to them:
;with
hospital (hospitalid, hospitalname) as (
select 0, 'HospitalA' union all
select 1, 'HospitalB'
),
gender (genderid, gendername) as (
select 0, 'Male' union all
select 1, 'Female'
),
agerange (agefrom, ageto) as (
select 18, 30 union all
select 31, 40
)
select * from
(
select p.diseaseid,
h.hospitalname + '_' + g.gendername + '_'
+ rtrim(a.agefrom) + '-' + rtrim(a.ageto) as Title,
1 Cnt
from Patients p
inner join hospital h on p.hospitalid = h.hospitalid
inner join gender g on p.genderid = g.genderid
inner join agerange a on p.age between a.agefrom and a.ageto
where p.age between 18 and 40
) t
pivot (
count (Cnt) for Title in (
[HospitalA_Male_18-30], [HospitalA_Male_31-40],
[HospitalA_Female_18-30], [HospitalA_Female_31-40],
[HospitalB_Male_18-30], [HospitalB_Male_31-40],
[HospitalB_Female_18-30], [HospitalB_Female_31-40]
)
) as Q
The overhead of adding the subselects and joins is made up for by greater ease of meaintenance:
the (meta)data part is separated from the logic part;
the name part lists are more convenient to expand as necessary;
the concatenation expression is easier to modify in case you need to change the format of the target column names.
Upvotes: 2
Reputation: 6205
Please try this
SELECT
DiseaseId,
SUM(CASE WHEN HospitalId = 0 AND GenderId=0 AND (Age BETWEEN 18 AND 30) THEN 1 ELSE 0 END) AS [HospitalA_Male_18-30],
SUM(CASE WHEN HospitalId = 0 AND GenderId=0 AND (Age BETWEEN 31 AND 40) THEN 1 ELSE 0 END) AS [HospitalA_Male_31-40],
SUM(CASE WHEN HospitalId = 0 AND GenderId=1 AND (Age BETWEEN 18 AND 30) THEN 1 ELSE 0 END) AS [HospitalA_Female_18-30],
......
FROM Patients
GROUP BY DiseaseId
ORDER BY DiseaseId
Upvotes: 2