DOT
DOT

Reputation: 329

Transposing table while collapsing duplicate observations per BY group

I have a dataset with diagnosis records, where a patient can have one or more records even for same code. I am unable to use group by variable 'code' since it shows error similar as The ID value "code_v58" occurs twice in the same BY group.

data have;
    input id rand found  code $;
    datalines;            
    1   101      1      001
    2   102      1      v58
    2   103      0      v58  /* second diagnosis record for patient 2 */
    3   104      1      v58
    4   105      1      003
    4   106      1      003  /* second diagnosis record for patient 4 */
    5   107      0      v58
    ;

Desired output:
Obs id code_001 code_v58 code_003
1    1   1 . .
2    2   . 1 . /* second diagnosis code's {v58} status for patient 2 is 1, so it has to be taken*/
3    3   . 1 .
4    4   . . 1
5    5   . 0 .

When I tried with let statement like [this],

proc transpose data=temp out=want(drop=_name_) prefix=code_ let;
  by id;
  id code;   * column name becomes <prefix><code>;
  var found;
run;

I got output as below:

Obs id code_001 code_v58 code_003
1    1   1 . .
2    2   . 0 .
3    3   . 1 .
4    4   . . 1
5    5   . 0 .

I tried this and modified PROC TRANSPOSE to use ID and count in the BY statement

proc transpose data=temp out=want(drop=_name_) prefix=code_;
      by id count;
      id code;   * column name becomes <prefix><code>;
      var found;
    run;

and got output like below:

   Obs id count code_001 code_v58 code_003
    1   1  1     1 . .
    2   2  1     . 1 .
    3   2  2     . 0 .
    4   3  1     . 1 .
    5   4  1     . . 1
    6   4  2     . . 1
    7   5  1     . 0 .

May I know how to remove duplicate patient ids and update the code to 1 if found in any records?

Upvotes: 1

Views: 494

Answers (2)

Richard
Richard

Reputation: 27508

You can transpose a group aggregate view.

proc sql; 
  create view have_v as
  select id, code, max(found) as found
  from have
  group by id, code
  order by id, code
;

proc transpose data=have_v out=want prefix=code_;
  by id;
  id code;
  var found;
run;

Follow up with Proc STDIZE (thanks @Reeza) if you want to replace the missing values (.) with 0

proc stdize data=want out=want missing=0 reponly;
var code_:;
run;

Upvotes: 2

Joe
Joe

Reputation: 63434

Seems to me that you want something like this - first preprocess the data to get the value you want for FOUND, then transpose (if you actually need to). The TABULATE does what it seems like you want to do for FOUND (take the max value of it, 1 if present, 0 if only 0s are present, missing otherwise), and then TRANSPOSE that the same way you were doing before.

proc tabulate data=have out=tab;
class id code;
var found;    
tables id,code*found*max;
run;

proc transpose data=tab out=want prefix=code_;
  by id;
  id code;
  var found_max;
run;

  

Upvotes: 2

Related Questions