Amy Chang
Amy Chang

Reputation: 23

How to create new columns based on value in SAS

I have a table like this:

enter image description here

person_id means the id number for each person, and ICD is the disease name.

Furthermore, this table is quite huge, and there are more than 5 million observations.

I want to make a new table like this:

enter image description here

So far, I tried to use PROC FREQ and PROC TRANSPOSE to make this table, but unfortunately, I got the error message:

the requested table is too large to process.

Is there another way to make this table?

Upvotes: 1

Views: 750

Answers (1)

Richard
Richard

Reputation: 27518

For this answer generate data of 5M+ rows covering ~20K patients.

data have;
  call streaminit(20201231);
  do id = 1 by 1 until (_n_ >= 5e6);
    do index = 1 to rand('integer',500);
      length code $6;
      x = rand('integer',1,1000);
      code = put(x,z6.);
      _n_+1;
      output;
    end;
  end; 
  keep id code;
run;

When data is sorted to have unique id values within a group a simple transpose + stdize can be run.

proc sql;
  create view codes_of_ids as
  select distinct id,code, 1 as flag from have;
quit;

proc transpose data=codes_of_ids out=want prefix=code_;
  by id;
  id code;
  var flag;
run;

/*
 * The above transpose logs
 * NOTE: The data set WORK.WANT has 19920 observations and 1002 variables.
 */

proc stdize data=want out=want missing=0 reponly;
var code_:;
run;

If, for some resource reason, the above will not run in your SAS session you can try a DATA step transposition using arrays.

proc sql; 
  create table allcodes_as_names as
  select distinct cats('code_',code) as name length=32
  from have
  ;

data allcodes_as_names; 
  set allcodes_as_names;
  index = _n_;
run;

proc transpose data=allcodes_as_names out=wide_codes_as_names(drop=_name_ where=(0));
  id name;
run;

data want;
  length name $32 index 8;
  call missing(name,index);

  declare hash indexOfName(dataset:'allcodes_as_names');
  indexOfName.defineKey('name');
  indexOfName.defineData('name', 'index');
  indexOfName.defineDone();

  if 0 then set have wide_codes_as_names;

  array codeflags code_:;

  declare hash namesOfId();
  namesOfId.defineKey('name');
  namesOfId.defineData('name');
  namesOfId.defineDone();

  declare hiter h('namesOfId');

  do until (lastrow);

    do _i_ = 1 to dim(codeflags);
      codeflags(_i_) = 0;
    end;

    namesOfIds.clear();

    do until (last.id);
      set have end=lastrow;
      by id;

      name = cats('code_',code);
      namesOfId.replace();
    end;

    do while(h.next() = 0);
      if indexOfName.find() = 0 then 
        codeflags(index) = 1;
    end;

    output;
  end;

  stop;
run;

Upvotes: 1

Related Questions