Reputation: 13
I have a list of people with medical visits on specific dates, and I've already flagged them for specific diagnoses. Now I'm trying to collapse these flags to create a file with one person/one date per line. I've found a solution that partially works, using UPDATE (found here: SAS collapse dates). However this overwrites the flagged diagnosis code with the last diagnosis code for a person on a specific date. Here is a simplified version of my data:
data have;
input id id_date diag_code $ flag;
datalines;
1 1 a .
1 1 b 1
1 1 c .
1 2 d 1
1 2 e .
1 2 f 1
2 1 g .
2 1 h .
2 1 i 1
2 2 j 1
3 1 k .
;
run;
data want;
update have (obs=0) have;
by id id_date;
run;
Output:
diag_
Obs id id_date code flag
1 1 1 c 1
2 1 2 f 1
3 2 1 i 1
4 2 2 j 1
5 3 1 k .
What I'm trying to get is:
diag_
Obs id id_date code flag
1 1 1 b 1
2 1 2 d 1
3 2 1 i 1
4 2 2 j 1
5 3 1 k .
So basically, I'd like to keep the diag_code from the first observation that has flag=1. I've tried the RENAME option to prevent overwriting the variable, but since UPDATE first reads in a dataset with 0 observations, the (new) original variable shows up but is empty:
data want;
update have (obs=0 rename=(diag_code=orig_diag_code)) have;
by id id_date;
run;
orig_
diag_ diag_
Obs id id_date code flag code
1 1 1 1 c
2 1 2 1 f
3 2 1 1 i
4 2 2 1 j
5 3 1 . k
Any ideas?
Edit: Adding where flag = 1 still erroneously shows the last diagnosis for when multiple flags occur, and does not produce observations on dates when the flag is missing:
data want;
update have (obs=0) have;
by id id_date;
where flag=1;
run;
diag_
Obs id id_date code flag
1 1 1 b 1
2 1 2 f 1
3 2 1 i 1
4 2 2 j 1
Upvotes: 1
Views: 185
Reputation: 51611
You need to move the DIAG_CODE variable out of the way so that it is not updated by the UPDATE statement. You need to make a new variable to retain the value found on the first FLAG=1 record. Then re-assign the right value back to DIAG_CODE.
data want;
update have (obs=0) have(rename=(diag_code=diag_code_orig));
by id id_date;
if 0 then keep_diag=diag_code ;
retain keep_diag ;
if first.id_date then call missing(keep_diag);
if flag=1 then keep_diag=coalescec(keep_diag,diag_code_orig);
diag_code=coalescec(keep_diag,diag_code_orig);
drop keep_diag diag_code_orig;
run;
You could instead re-merge on the DIAG_CODE value from the first FLAG=1 record in a second step.
data want ;
update have(obs=0) have;
by id id_date;
run;
data want ;
merge want
have(keep=id id_date diag_code flag rename=(flag=xflag)
where=(xflag=1 and not missing(diag_code)))
;
by id id_date;
if first.id_date;
drop xflag;
run;
Upvotes: 2
Reputation: 27508
The first row in a group meeting some criteria can be selected using flag (or state maintenance) variables.
Here are two ways:
retain
or first.
retain
, first.
and last.
Code:
* DOW over group with last., no retain or first.;
data want;
do _n_ = 1 by 1 until (last.id_date);
set have;
by id id_date;
if flag and not flagged then do;
output;
flagged = _n_;
end;
end;
if not flagged then output;
drop flagged;
run;
* Only implicit loop with retain, first. and last.;
data want;
retain flagged;
drop flagged;
set have;
by id id_date;
if first.id_date then flagged = .;
if flag and not flagged then do;
flagged = 1;
output;
end;
if last.id_date and not flagged then
output;
run;
Upvotes: 2