Transposing table while collapsing duplicate observations per BY group

Question

I have a dataset with diagnosis records, where a patient can have one or more records even for same code. I am unable to use group by variable 'code' since it shows error similar as The ID value "code_v58" occurs twice in the same BY group.

data have;
    input id rand found  code $;
    datalines;            
    1   101      1      001
    2   102      1      v58
    2   103      0      v58  /* second diagnosis record for patient 2 */
    3   104      1      v58
    4   105      1      003
    4   106      1      003  /* second diagnosis record for patient 4 */
    5   107      0      v58
    ;

Desired output:
Obs id code_001 code_v58 code_003
1    1   1 . .
2    2   . 1 . /* second diagnosis code's {v58} status for patient 2 is 1, so it has to be taken*/
3    3   . 1 .
4    4   . . 1
5    5   . 0 .

When I tried with let statement like [this],

proc transpose data=temp out=want(drop=_name_) prefix=code_ let;
  by id;
  id code;   * column name becomes ;
  var found;
run;


I got output as below:
Obs id code_001 code_v58 code_003
1    1   1 . .
2    2   . 0 .
3    3   . 1 .
4    4   . . 1
5    5   . 0 .

I tried this and modified PROC TRANSPOSE to use ID and count in the BY statement
proc transpose data=temp out=want(drop=_name_) prefix=code_;
      by id count;
      id code;   * column name becomes ;
      var found;
    run;

and got output like below:
   Obs id count code_001 code_v58 code_003
    1   1  1     1 . .
    2   2  1     . 1 .
    3   2  2     . 0 .
    4   3  1     . 1 .
    5   4  1     . . 1
    6   4  2     . . 1
    7   5  1     . 0 .

May I know how to remove duplicate patient ids and update the code to 1 if found in any records?

Richard · Accepted Answer

You can transpose a group aggregate view.

proc sql; 
  create view have_v as
  select id, code, max(found) as found
  from have
  group by id, code
  order by id, code
;

proc transpose data=have_v out=want prefix=code_;
  by id;
  id code;
  var found;
run;

Follow up with Proc STDIZE (thanks @Reeza) if you want to replace the missing values (.) with 0

proc stdize data=want out=want missing=0 reponly;
var code_:;
run;

Transposing table while collapsing duplicate observations per BY group

Answers (2)

Related Questions