Reputation: 23
I have a table like this:
person_id means the id number for each person, and ICD is the disease name.
Furthermore, this table is quite huge, and there are more than 5 million observations.
I want to make a new table like this:
So far, I tried to use PROC FREQ and PROC TRANSPOSE to make this table, but unfortunately, I got the error message:
the requested table is too large to process.
Is there another way to make this table?
Upvotes: 1
Views: 750
Reputation: 27518
For this answer generate data of 5M+ rows covering ~20K patients.
data have;
call streaminit(20201231);
do id = 1 by 1 until (_n_ >= 5e6);
do index = 1 to rand('integer',500);
length code $6;
x = rand('integer',1,1000);
code = put(x,z6.);
_n_+1;
output;
end;
end;
keep id code;
run;
When data is sorted to have unique id values within a group a simple transpose
+ stdize
can be run.
proc sql;
create view codes_of_ids as
select distinct id,code, 1 as flag from have;
quit;
proc transpose data=codes_of_ids out=want prefix=code_;
by id;
id code;
var flag;
run;
/*
* The above transpose logs
* NOTE: The data set WORK.WANT has 19920 observations and 1002 variables.
*/
proc stdize data=want out=want missing=0 reponly;
var code_:;
run;
If, for some resource reason, the above will not run in your SAS session you can try a DATA step transposition using arrays.
proc sql;
create table allcodes_as_names as
select distinct cats('code_',code) as name length=32
from have
;
data allcodes_as_names;
set allcodes_as_names;
index = _n_;
run;
proc transpose data=allcodes_as_names out=wide_codes_as_names(drop=_name_ where=(0));
id name;
run;
data want;
length name $32 index 8;
call missing(name,index);
declare hash indexOfName(dataset:'allcodes_as_names');
indexOfName.defineKey('name');
indexOfName.defineData('name', 'index');
indexOfName.defineDone();
if 0 then set have wide_codes_as_names;
array codeflags code_:;
declare hash namesOfId();
namesOfId.defineKey('name');
namesOfId.defineData('name');
namesOfId.defineDone();
declare hiter h('namesOfId');
do until (lastrow);
do _i_ = 1 to dim(codeflags);
codeflags(_i_) = 0;
end;
namesOfIds.clear();
do until (last.id);
set have end=lastrow;
by id;
name = cats('code_',code);
namesOfId.replace();
end;
do while(h.next() = 0);
if indexOfName.find() = 0 then
codeflags(index) = 1;
end;
output;
end;
stop;
run;
Upvotes: 1