user356178
user356178

Reputation:

How to count people in complex age/gender/etc groups?

I got the following Patients table.

HospitalId INT,
GenderId BIT,
Age TINYINT,
DiseaseId SMALLINT

GenderId = 0 is Male

GenderId = 1 is Female

HospitalA has the HospitalId 0

HospitalB has the HospitalId 1

Here's the output I want to produce:

DiseaseId | HospitalA_Male_18-30 | HospitalA_Male_31-40 |
---------------------------------------------------------
0         |   (count here)       |   (count here)       |
1         |   (count here)       |   (count here)       |
2         |   (count here)       |   (count here)       |
3         |   (count here)       |   (count here)       |

(columns continued)

HospitalA_Female_18-30 | HospitalA_Female_31-40 |
-------------------------------------------------
    (count here)       |     (count here)       |
    (count here)       |     (count here)       |
    (count here)       |     (count here)       |
    (count here)       |     (count here)       |

(columns continued)

HospitalB_Male_18-30 | HospitalB_Male_31-40 |
---------------------------------------------
    (count here)     |     (count here)     |
    (count here)     |     (count here)     |
    (count here)     |     (count here)     |
    (count here)     |     (count here)     |

(columns continued)

HospitalB_Female_18-30 | HospitalB_Female_31-40 |
-------------------------------------------------
    (count here)       |     (count here)       |
    (count here)       |     (count here)       |
    (count here)       |     (count here)       |
    (count here)       |     (count here)       |

(9 columns in the result set)

So as you can see I actually need to count, for each disease, how many patients have the disease in each specific group (by hospital, by gender and by age category).

How can such grouping be done (most efficiently) in T-SQL?

Upvotes: 3

Views: 666

Answers (2)

Nikola Markovinović
Nikola Markovinović

Reputation: 19356

You might do it using pivot query:

select * from 
(
  select diseaseid, 
         'Hospital'
         + case hospitalid when 0 then 'A' when 1 then 'B' end
         + '_'
         + case genderid when 1 then 'Female' else 'Male' end
         + '_'
         + case when age between 18 and 30 
              then '18-30' 
              else (case when age between 31 and 40 then '31-40' end) 
              end Title,
         1 Cnt
  from Patients
  where age between 18 and 40
) t
pivot (
  count (Cnt) for Title in (
    [HospitalA_Male_18-30],   [HospitalA_Male_31-40],
    [HospitalA_Female_18-30], [HospitalA_Female_31-40],
    [HospitalB_Male_18-30],   [HospitalB_Male_31-40],
    [HospitalB_Female_18-30], [HospitalB_Female_31-40]
  )
) as Q

UPDATE

As a development of the above solution, you could also move the name parts from the CASE expressions to their own virtual tables and join the Patients table to them:

;with
hospital (hospitalid, hospitalname) as (
  select 0, 'HospitalA' union all
  select 1, 'HospitalB'
),
gender (genderid, gendername) as (
  select 0, 'Male' union all
  select 1, 'Female'
),
agerange (agefrom, ageto) as (
  select 18, 30 union all
  select 31, 40
)
select * from 
(
  select p.diseaseid, 
         h.hospitalname + '_' + g.gendername + '_'
         + rtrim(a.agefrom) + '-' + rtrim(a.ageto) as Title,
         1 Cnt
  from Patients p
    inner join hospital h on p.hospitalid = h.hospitalid
    inner join gender   g on p.genderid   = g.genderid
    inner join agerange a on p.age between a.agefrom and a.ageto
  where p.age between 18 and 40
) t
pivot (
  count (Cnt) for Title in (
    [HospitalA_Male_18-30],   [HospitalA_Male_31-40],
    [HospitalA_Female_18-30], [HospitalA_Female_31-40],
    [HospitalB_Male_18-30],   [HospitalB_Male_31-40],
    [HospitalB_Female_18-30], [HospitalB_Female_31-40]
  )
) as Q

The overhead of adding the subselects and joins is made up for by greater ease of meaintenance:

  • the (meta)data part is separated from the logic part;

  • the name part lists are more convenient to expand as necessary;

  • the concatenation expression is easier to modify in case you need to change the format of the target column names.

Upvotes: 2

EricZ
EricZ

Reputation: 6205

Please try this

SELECT  
    DiseaseId,
    SUM(CASE WHEN HospitalId = 0 AND GenderId=0 AND (Age BETWEEN 18 AND 30)  THEN 1 ELSE 0 END) AS [HospitalA_Male_18-30],
    SUM(CASE WHEN HospitalId = 0 AND GenderId=0 AND (Age BETWEEN 31 AND 40)  THEN 1 ELSE 0 END) AS [HospitalA_Male_31-40],  
    SUM(CASE WHEN HospitalId = 0 AND GenderId=1 AND (Age BETWEEN 18 AND 30)  THEN 1 ELSE 0 END) AS [HospitalA_Female_18-30],    
    ......  
FROM   Patients 
GROUP BY DiseaseId
ORDER BY DiseaseId  

Upvotes: 2

Related Questions