Reputation: 175
I have a dataset that consists of a series of readings made by different people/instruments, of a bunch of different dimensions. It looks like this:
SUBJECT DIM1_1 DIM1_2 DIM1_3 DIM1_4 DIM1_5 DIM2_1 DIM2_2 DIM2_3 DIM3_1 DIM3_2
1 1 . 1 1 2 3 3 3 2 .
2 1 1 . 1 1 2 2 3 1 1
3 2 2 2 . . 1 . . 5 5
... ... ... ... ... ... ... ... ... ... ...
My real dataset contains around 190 dimensions, with up to 5 measures in each one
I have to obey a set of rules to create a new variable for each dimension:
My new variables should look like this:
SUBJECT ... DIM1_X DIM2_X DIM3_X
1 ... . 3 2
2 ... 1 . 1
3 ... 2 1 5
The problem here is that i don't have the same number of measures for each dimension. Also, i could only come up with a lot of IF's (and I mean a LOT, as more measures in a given dimension increases the number of comparisons), so I wonder if there is some easier way to handle this particular problem.
Any help would be apreciated. Thanks in advance.
Upvotes: 0
Views: 127
Reputation: 63434
Easiest way is to transpose it to vertical (one row per DIMx_y), summarize, then set the ones you want missing to missing, then retranspose (and if needed merge back on).
data have;
input SUBJECT DIM1_1 DIM1_2 DIM1_3 DIM1_4 DIM1_5 DIM2_1 DIM2_2 DIM2_3 DIM3_1 DIM3_2;
datalines;
1 1 . 1 1 2 3 3 3 2 .
2 1 1 . 1 1 2 2 3 1 1
3 2 2 2 . . 1 . . 5 5
;;;;
run;
data have_pret;
set have;
array dim_data DIM:;
do _t = 1 to dim(dim_Data); *dim function is not related to the name - it gives # of vars in array;
dim_Group = scan(vname(dim_data[_t]),1,'_');
dim_num = input(scan(vname(dim_data[_t]),2,'_'),BEST12.);
dim_val=dim_data[_t];
output;
end;
keep dim_group dim_num subject dim_val;
run;
proc freq data=have_pret noprint;
by subject dim_group;
tables dim_val/out=want_pret(where=(not missing(dim_val)));
run;
data want_pret2;
set want_pret;
by subject dim_Group;
if percent ne 100 then dim_val=.;
idval = cats(dim_Group,'_X');
if last.dim_Group;
run;
proc transpose data=want_pret2 out=want;
by subject;
id idval;
var dim_val;
run;
Upvotes: 2