Reputation: 147
I have a dataset of patient data with each diagnosis on a different line.
This is an example of what it looks like:
patientID diabetes cancer age gender
1 1 0 65 M
1 0 1 65 M
2 1 1 23 M
2 0 0 23 M
3 0 0 50 F
3 0 0 50 F
I need to isolate the patients who have a diagnosis of both diabetes and cancer; their unique patient identifier is patientID. Sometimes they are both on the same line, sometimes they aren't. I am not sure how to do this because the information is on multiple lines.
How would I go about doing this?
This is what I have so far:
PROC SQL;
create table want as
select patientID
, max(diabetes) as diabetes
, max(cancer) as cancer
, min(DOB) as DOB
from diab_dx
group by patientID;
quit;
data final; set want;
if diabetes GE 1 AND cancer GE 1 THEN both = 1;
else both =0;
run;
proc freq data=final;
tables both;
run;
Is this correct?
Upvotes: 3
Views: 69
Reputation: 27508
You might find some coders, especially those coming from statistical backgrounds, are more likely to use Proc MEANS
instead of SQL or DATA step to compute the diagnostic flag maximums.
proc means noprint data=have;
by patientID;
output out=want
max(diabetes) = diabetes
max(cancer) = cancer
min(age) = age
;
run;
or for the case of all the same aggregation function
proc means noprint data=have;
by patientID;
var diabetes cancer;
output out=want max= ;
run;
or
proc means noprint data=have;
by patientID;
var diabetes cancer age;
output out=want max= / autoname;
run;
Upvotes: 1
Reputation: 3315
add a having statement at the end of sql query should do.
PROC SQL;
create table want as
select patientID
, max(diabetes) as diabetes
, max(cancer) as cancer
, min(age) as DOB
from PAT
group by patientID
having calculated diabetes ge 1 and calculated cancer ge 1;
quit;
Upvotes: 1
Reputation: 9109
If you want to learn about data steps lookup how this works.
data pat;
input patientID diabetes cancer age gender:$1.;
cards;
1 1 0 65 M
1 0 1 65 M
2 1 1 23 M
2 0 0 23 M
3 0 0 50 F
3 0 0 50 F
;;;;
run;
data both;
do until(last.patientid);
set pat; by patientid;
_diabetes = max(diabetes,_diabetes);
_cancer = max(cancer,_cancer);
end;
both = _diabetes and _cancer;
run;
proc print;
run;
Upvotes: 2