itzy
itzy

Reputation: 11765

In SAS, use one variable to choose another variable to change

I have a data set with variables col1-col5 that contain an numeric identifier. There are 2000 possible values of the identifier. I also have 2000 variables named dX, where X is one of the 2000 different identifier values.

I want to loop over the col variables and then set the corresponding d variable that is indexed by the identifier to equal 1.

For example, suppose I have the observation:

col1  col2    col3   col4   col5   d10007   d10010   d10031 ... d10057 ...
10031 10057      .      .      .        .        .        .          .

I would want to set d10031 and d10057 to both equal 1.

Is this possible? If the numbers were sequential I see how to use an array, but given that they're not I can't see how to do it.

Upvotes: 3

Views: 592

Answers (2)

Tim Sands
Tim Sands

Reputation: 1068

Another alternative is to just create the entire sequential D array, then drop the 'fake' d variables at the end, like so:

data have;    
    Col1 = 10;
    Col2 = 35;
    Col3=.;
    Array Dvars {*} d1 d10 d25 d35;
run;
/* Get a list of all actual D variable /*    
proc sql noprint ;    
    select name into :dColumnsToKeep separated by ' '
    From SASHELP.VColumn
    where libname="WORK" and memname = "HAVE"
    AND name LIKE 'd%';
;quit;    
%put &dColumnsToKeep;    

data want (keep=Col: &dColumnsToKeep);    
    set have;
    array AllDVars {*} d1-d9999; *Set d9999 to as big as needed;
    array ColVars {*} Col:;
    do i = 1 to Dim(ColVars);
        if colvars(i) ne . Then AllDvars(Colvars(i)) = 1;
    end;
run;    

This may be quicker processing, since it avoids looping. Though I dont know what the tradeoff is memory-wise to have SAS create 10K or 100K variables in the datastep.

Upvotes: 2

Joe
Joe

Reputation: 63434

It can be done in an array. I'll explain, after the mandatory polemic about data structure.

This looks like it should be a vertical data structure, ie col d variables and multiple lines (with some ID tying them together).

Now, to do this in the structure you have:

You need to use the VNAME function. This allows you to get at the name of the array variable as a string. You can't take col1=10531 and create a statement d10531=1, but you can look at d10531 and compare its value to col1.

This is slow, because you need to loop twice over your variables, unless you have a reliable ordering. Your data above does respect the ordering (ie, COL1-n are in order, and D1-n are in order, so you can move left to right and not loop twice). If this isn't the case, then you may want to use call sortn with the COL array, if that's acceptable. The Dxx array should be able to be defined in proper order (if it's not in proper order on the dataset, you can construct the array statement in a macro variable ordering the variables there - the order on the array statement matters, the order in the dataset does not, unless you're using d:.)

Here's an example of the left to right structure.

data want;
set have;
array cols col1-col2000;
array ds d1:; *or whatever defines d;
_citer=1;
do _diter = 1 to dim(ds) while (_citer le dim(cols)); *safety check;
  if compress(vname(ds[_diter]),,'kd') = cols[_citer] then do;
    ds[_diter] = 1;
    _citer+1;
  end;
end;
run;

It iterates over ds, checks each one against the current col, and when it finds its match, sets that and then stops. This should be flexible - would work with any structure of ds, even if it has very many values. It will not, however, work if cols is not sorted in ascending order of value. If it's not, you would need to put an inner loop to check each cols variable, meaning you have [dim(ds)*dim(cols)] loop iterations instead of [dim(ds)] loop iterations at most.

Upvotes: 2

Related Questions