HHJ
HHJ

Reputation: 13

How to retain original value of variable not in BY group when using UPDATE to collapse observations

I have a list of people with medical visits on specific dates, and I've already flagged them for specific diagnoses. Now I'm trying to collapse these flags to create a file with one person/one date per line. I've found a solution that partially works, using UPDATE (found here: SAS collapse dates). However this overwrites the flagged diagnosis code with the last diagnosis code for a person on a specific date. Here is a simplified version of my data:

data have;
input id id_date diag_code $ flag;
datalines;
1 1 a .
1 1 b 1
1 1 c .
1 2 d 1
1 2 e .
1 2 f 1
2 1 g .
2 1 h .
2 1 i 1
2 2 j 1
3 1 k . 
;
run;

data want;
    update have (obs=0) have;
    by id id_date;
run;

Output:

                        diag_
Obs    id    id_date    code     flag

 1      1       1         c        1  
 2      1       2         f        1  
 3      2       1         i        1  
 4      2       2         j        1 
 5      3       1         k        . 

What I'm trying to get is:

                        diag_
Obs    id    id_date    code     flag

 1      1       1         b        1  
 2      1       2         d        1  
 3      2       1         i        1  
 4      2       2         j        1 
 5      3       1         k        . 

So basically, I'd like to keep the diag_code from the first observation that has flag=1. I've tried the RENAME option to prevent overwriting the variable, but since UPDATE first reads in a dataset with 0 observations, the (new) original variable shows up but is empty:

data want;
    update have (obs=0 rename=(diag_code=orig_diag_code)) have;
    by id id_date;
run;

                        orig_
                        diag_              diag_
Obs    id    id_date    code     flag      code

 1      1       1                  1       c  
 2      1       2                  1       f  
 3      2       1                  1       i  
 4      2       2                  1       j
 5      3       1                  .       k   

Any ideas?

Edit: Adding where flag = 1 still erroneously shows the last diagnosis for when multiple flags occur, and does not produce observations on dates when the flag is missing:

data want;
    update have (obs=0) have;
    by id id_date;
    where flag=1;
run;

                        diag_
Obs    id    id_date    code     flag

 1      1       1         b        1  
 2      1       2         f        1  
 3      2       1         i        1  
 4      2       2         j        1  

Upvotes: 1

Views: 185

Answers (2)

Tom
Tom

Reputation: 51611

You need to move the DIAG_CODE variable out of the way so that it is not updated by the UPDATE statement. You need to make a new variable to retain the value found on the first FLAG=1 record. Then re-assign the right value back to DIAG_CODE.

data want;
  update have (obs=0) have(rename=(diag_code=diag_code_orig));
  by id id_date;
  if 0 then keep_diag=diag_code ;
  retain keep_diag ;
  if first.id_date then call missing(keep_diag);
  if flag=1 then keep_diag=coalescec(keep_diag,diag_code_orig);
  diag_code=coalescec(keep_diag,diag_code_orig);
  drop keep_diag diag_code_orig;
run;

You could instead re-merge on the DIAG_CODE value from the first FLAG=1 record in a second step.

data want ;
  update have(obs=0) have;
  by id id_date;
run;

data want ;
  merge want
        have(keep=id id_date diag_code flag rename=(flag=xflag)
             where=(xflag=1 and not missing(diag_code)))
  ;
  by id id_date;
  if first.id_date;
  drop xflag;
run;

Upvotes: 2

Richard
Richard

Reputation: 27508

The first row in a group meeting some criteria can be selected using flag (or state maintenance) variables.

Here are two ways:

  • DOW loop over group, does not require retain or first.
  • Only implicit loop with retain, first. and last.

Code:

* DOW over group with last., no retain or first.;
data want;
  do _n_ = 1 by 1 until (last.id_date);
    set have;
    by id id_date;
    if flag and not flagged then do;
      output;
      flagged = _n_;
    end;
  end;
  if not flagged then output;

  drop flagged;
run;

* Only implicit loop with retain, first. and last.;
data want;
  retain flagged;
  drop flagged;

  set have;
  by id id_date;

  if first.id_date then flagged = .;

  if flag and not flagged then do;
    flagged = 1;
    output;
  end;

  if last.id_date and not flagged then
    output;
run;

Upvotes: 2

Related Questions