ybao
ybao

Reputation: 147

Isolate Patients with 2 diagnoses but diagnosis data is on different lines

I have a dataset of patient data with each diagnosis on a different line.

This is an example of what it looks like:

patientID diabetes cancer age gender 
1         1          0     65    M     
1         0          1     65    M     
2         1          1     23    M     
2         0          0     23    M     
3         0          0     50    F     
3         0          0     50    F

I need to isolate the patients who have a diagnosis of both diabetes and cancer; their unique patient identifier is patientID. Sometimes they are both on the same line, sometimes they aren't. I am not sure how to do this because the information is on multiple lines.

How would I go about doing this?

This is what I have so far:

PROC SQL;
create table want as
select patientID
       , max(diabetes) as diabetes
       , max(cancer) as cancer
       , min(DOB) as DOB
   from diab_dx

   group by patientID;
quit;

data final; set want;
if diabetes GE 1 AND cancer GE 1 THEN both = 1;
else both =0;

run;

proc freq data=final;
tables both;
run;

Is this correct?

Upvotes: 3

Views: 69

Answers (3)

Richard
Richard

Reputation: 27508

You might find some coders, especially those coming from statistical backgrounds, are more likely to use Proc MEANS instead of SQL or DATA step to compute the diagnostic flag maximums.

proc means noprint data=have;
  by patientID;
  output out=want 
    max(diabetes) = diabetes 
    max(cancer) = cancer
    min(age) = age
  ;
run;

or for the case of all the same aggregation function

proc means noprint data=have;
  by patientID;
  var diabetes cancer;
  output out=want max=  ;
run;

or

proc means noprint data=have;
  by patientID;
  var diabetes cancer age; 
  output out=want max= / autoname;
run;

Upvotes: 1

Kiran
Kiran

Reputation: 3315

add a having statement at the end of sql query should do.

  PROC SQL;
  create table want as
  select patientID
   , max(diabetes) as diabetes
   , max(cancer) as cancer
   , min(age) as DOB
  from PAT

  group by patientID
  having calculated diabetes ge 1 and calculated cancer ge 1;
 quit;

Upvotes: 1

data _null_
data _null_

Reputation: 9109

If you want to learn about data steps lookup how this works.

data pat;
   input patientID diabetes cancer age gender:$1.; 
   cards;
1         1          0     65    M     
1         0          1     65    M     
2         1          1     23    M     
2         0          0     23    M     
3         0          0     50    F     
3         0          0     50    F
;;;;
   run;
data both;
   do until(last.patientid);
      set pat; by patientid;
      _diabetes = max(diabetes,_diabetes);
      _cancer   = max(cancer,_cancer);
      end;
   both = _diabetes and _cancer;
   run;
proc print;
   run;

Upvotes: 2

Related Questions