Kevin
Kevin

Reputation: 33

SAS: Count number of a particular type of disease with patient data on multiple lines

I have large dataset of a few million patient encounters that include a diagnosis, timestamp, patientID, and demographic information.

We have found that a particular type of disease is frequently comorbid with a common condition.

I would like to count the number of this type of disease that each patient has, and then create a histogram showing how many people have 1,2,3,4, etc. additional diseases.

This is the format of the data.

PatientID   Diagnosis   Date    Gender  Age
1           282.1       1/2/10      F   25
1           282.1       1/2/10      F   87
1           232.1       1/2/10      F   87
1           250.02      1/2/10      F   41
1           125.1       1/2/10      F   46
1           90.1        1/2/10      F   58
2           140         12/15/13    M   57
2           282.1       12/15/13    M   41
2           232.1       12/15/13    M   66
3           601.1       11/19/13    F   58
3           231.1       11/19/13    F   76
3           123.1       11/19/13    F   29
4           601.1       12/30/14    F   81
4           130.1       12/30/14    F   86
5           230.1       1/22/14     M   60
5           282.1       1/22/14     M   46
5           250.02      1/22/14     M   53

Generally, I was thinking of a DO loop, but I'm not sure where to start because there are duplicates in the dataset, like with patient 1 (282.1 is listed twice). I'm not sure how to account for that. Any thoughts?

Target diagnoses to count would be 282.1, 232.1, 250.02. In this example, patient 1 would have a count of 3, patient 2 would have 2, etc.

Edit: This is what I have used, but the output is showing each PatientID on multiple lines in the output.

PROC SQL;
create table want as
select age, gender, patientID,
       count(distinct diagnosis_description) as count
   from dz_prev
   where diagnosis in (282.1, 232.1)
   group by patientID;
quit;

This is what the output table looks like. Why is this patientID showing up so many times?

Obs AGE GENDER PATIENTID count
1 55 Male 107828695 1
2 54 Male 107828695 1
3 54 Male 107828695 1
4 54 Male 107828695 1
5 54 Male 107828695 1

Upvotes: 1

Views: 469

Answers (2)

Tom
Tom

Reputation: 51611

If you include variables that are neither grouping variables or summary statistics then SAS will happily re-merge your summary statistics back with all of the source records. That is why you are getting multiple records. AGE can usually vary if your dataset covers many years. And GENDER can also vary if your data is messy. So for a quick analysis you might try something like this.

create table want as
select patientID
     , min(age) as age_at_onset
     , min(gender) as gender
     , count(distinct diagnosis_description) as count
   from dz_prev
   where diagnosis in (282.1, 232.1)
   group by patientID
;

Upvotes: 3

DomPazz
DomPazz

Reputation: 12465

I think you can get what you want with an SQL statement

PROC SQL NOPRINT;
create table want as
select PatientID,
       count(distinct Diagnosis) as count
   from have
   where Diagnosis in (282.1, 232.1, 250.02)
   group by PatientID;
quit;

This filters to only the diagnoses you are interested in, counts the distinct times they are seen, by the PatientID, and saves the results to a new table.

Upvotes: 3

Related Questions