wimz
wimz

Reputation: 1

Modifying values of a column based on sequential look up of other values in a determined range

Alright, here's the table I have (much bigger) and with different styles (a,B,C,D...).

+-------+-------+---------+
| Style | Range |   Int   |
+-------+-------+---------+
| A     | 0-1   | .       |
| A     | 1-5   | .       |
| A     | 5-11  | .       |
| A     | 11-12 | .       |
| A     | 12-24 | -999999 |
| A     | 24-36 | 0       |
| A     | 36-48 | 0       |
| A     | 48-60 | -999999 |
| A     | 60+   | 0       |
+-------+-------+---------+

I want to change every single -999999 values based on a sequence that goes like this:

if(int = -999999) and range ='12-24'
   then get the first non null value in int from
   11-12, 5-11, 1-5, 0-1 24-36, 36-48, 48-60, 60+ in that order

if(int = -999999) and range ='0-1' or '1-5' or '5-11' or '11-12'
   then get the first non null value in int from
   12-24, 11-12, 5-11, 1-5, 0-1 24-36, 36-48, 48-60, 60+ in that order without looking up self

if(int = -999999) and range ='24-36'
   then get the first non null value in int from
   36-48, 48-60, 60+, 12-24, 11-12, 5-11, 1-5, 0-1 in that order 

if(int = -999999) and range ='36-48' or '48-60' or '60+'
   then get the first non null value in int from
   24-36, 36-48, 48-60, 60+, 12-24, 11-12, 5-11, 1-5, 0-1 in that order without looking up self

This would get me this table:

+-------+-------+---------+
| Style | Range |   Int   |
+-------+-------+---------+
| A     | 0-1   | .       |
| A     | 1-5   | .       |
| A     | 5-11  | .       |
| A     | 11-12 | .       |
| A     | 12-24 | 0       |
| A     | 24-36 | 0       |
| A     | 36-48 | 0       |
| A     | 48-60 | 0       |
| A     | 60+   | 0       |
+-------+-------+---------+

Hope this is clear. Is there an easier approach to do this?

Thanks

Upvotes: 0

Views: 53

Answers (1)

Richard
Richard

Reputation: 27508

For each style processed you will have to store all the int values in an array (or hash) and perform your peculiar backwards then forwards rules centrized on 12-24.

Q: Suppose your sample data had 0-1 -999999. The rules would pick out -999999 from range 12-24. Do you search once until non-missing, or multiple until non-missing and non-<special-value>. Perhaps you think such data will not occur, but data don't care what you think :)

Presume every style has all nine ranges. Serial DOW looping will let you array the groups ints as you go down the items of group, and then process each item according to the lookup rules. The rules can be encoded in a delta table that indicates where the next lookup value is.

data have; infile datalines dlm='|';
input 
  Style$  Range$    Int; datalines;
| A     | 0-1   | .       |
| A     | 1-5   | .       |
| A     | 5-11  | .       |
| A     | 11-12 | .       |
| A     | 12-24 | -999999 |
| A     | 24-36 | 0       |
| A     | 36-48 | 0       |
| A     | 48-60 | -999999 |
| A     | 60+   | 0       |
run;

data want;
  array ints(9) int_1 - int_9;

  do _n_ = 1 by 1 until (last.style);
    set have;
    by style;
    ints(_n_) = int;
  end;

  do _n_ = 1 to _n_;
    set have;

    if int = -999999 then do;
      * traversal data for look up rules;
      array rule[9,9] _temporary_ (
        /* 0-1   */ +4 +3 +2 +1 +0 +5 +6 +7 +8
        /* 1-5   */ +3 +2 +1 +0 -1 +4 +5 +6 +7
        /* 5-11  */ +2 +1 +0 -1 -2 +3 +4 +5 +6
        /* 11-12 */ +1 +0 -1 -2 -3 +2 +3 +4 +5
        /* 12-24 */ -1 -2 -3 -4 +0 +1 +2 +3 +4
        /* 24-36 */ +1 +2 +3 +0 -1 -2 -3 -4 -5
        /* 36-48 */ -1 +0 +1 +2 -2 -3 -4 -5 -6
        /* 48-60 */ -2 -1 +0 +1 -3 -4 -5 -6 -7
        /* 60+   */ -3 -2 -1 +0 -4 -5 -6 -7 -8
      );

      do _m_ = 1 to 9 while (int = -999999); 
        step = rule(_n_,_m_);
        if ints(_n_ + step) not in (., -999999) then 
          int = ints(_n_ + step);
      end;
    end;

    output;
  end;

  drop int_: _m_ step;
run;

@Whymath asks how the rule array is made.

The coding presumes every group has every range, and the ranges are sorted in the order stated in the question.

      Array
Range Index
----- -----
 0-1    1
 1-5    2
 5-11   3
 11-12  4
 12-24  5
 24-36  6
 36-48  7
 48-60  8
 60+    9

The range array index is used in the first dimension of the rules array.

So, what is done when -999999 found for range 0-1 ? The second if tells you

if(int = -999999) and range ='0-1' or '1-5' or '5-11' or '11-12' then get the first non null value in int from 12-24, 11-12, 5-11, 1-5, 0-1 24-36, 36-48, 48-60, 60+ in that order

The first DO loop fills the ints array (of 9 values) with the int values of the group (presuming to have a value for each range).

So when -999999 found @ 0-1 the if says lookup for non-null, non -999999 in range lookup order:

0-1     range   lookup   index delta
index   lookup  index    [0-1] index to lookup index
-----   ------  ------   --------
  1     12-24     5        +4
  1     11-12     4        +3
  1      5-11     3        +2
  1      1-5      2        +1
  1      0-1      1         0
  1     24-36     6        +5
  1     36-48     7        +6
  1     48-60     8        +7
  1     60+       9        +8

Looking down the index delta, those values become the steps to apply when searching for non-null non-999999. The index delta is the second dimension in the rules table so the values go across in the 2-D array initialization.

Repeat the logic conversion to 'steps' for the 8 other ranges and you get the 9x9 rules matrix.

If the ranges are not necessarily complete, the values in each group and the searching rules can be done with using hashes instead of arrays. Here is an example of using Proc DS2 and a multidata hash for rules.

* implement loop over group_lookup[logic_lookup[range]]; 
* the inner lookup result becomes the key for the outer lookup;

proc ds2;
  data want(overwrite=yes keep=(style range "int"));
    declare char(8) style;
    declare char(8) range key;
    declare int "int" value;

    declare package hash group_lookup([key], [value]);
    declare package hash logic_lookup();

    method init();
      declare char(8) keys[9];
      declare int index;

      logic_lookup.keys([range]);
      logic_lookup.data([key]);
      logic_lookup.multiData('yes');
      logic_lookup.defineDone();

      keys := ('12-24' '11-12' '5-11' '1-5' '0-1' '24-36' '36-48' '48-60' '60+');

      range = '0-1';   do index = 1 to dim(keys); key = keys[index]; if key ne range then logic_lookup.add(); end;
      range = '1-5';   do index = 1 to dim(keys); key = keys[index]; if key ne range then logic_lookup.add(); end;
      range = '5-11';  do index = 1 to dim(keys); key = keys[index]; if key ne range then logic_lookup.add(); end;
      range = '11-12'; do index = 1 to dim(keys); key = keys[index]; if key ne range then logic_lookup.add(); end;
      range = '12-24'; do index = 1 to dim(keys); key = keys[index]; if key ne range then logic_lookup.add(); end;

      keys := ('24-36' '36-48' '48-60' '60+' '12-24' '11-12' '5-11' '1-5' '0-1' '24-36');

      range = '24-36'; do index = 1 to dim(keys); key = keys[index]; if key ne range then logic_lookup.add(); end;
      range = '36-48'; do index = 1 to dim(keys); key = keys[index]; if key ne range then logic_lookup.add(); end;
      range = '48-60'; do index = 1 to dim(keys); key = keys[index]; if key ne range then logic_lookup.add(); end;
      range = '60+';   do index = 1 to dim(keys); key = keys[index]; if key ne range then logic_lookup.add(); end;
    end;

    method findReplacement(char(10) range, int in_value) returns int;
      declare int result rc;

      if in_value not in (. -999999) then return in_value;

      if logic_lookup.find() eq 0 then do; * hash host variables [range] and [key] are global, do not mask them with local variables of the same name;
        if group_lookup.find() eq 0 then do; * hash host variables [keys] and [value] are global, do not mask them with local variables of the same name;

          if value not in (. -999999) then return value;

          do while (logic_lookup.has_next() = 0); * search the other rule ranges listed in the multidata;
            if logic_lookup.find_next() = 0 
             & group_lookup.find() = 0
             & value not in (. -999999) then return value;
          end;
        end;
      end;
      else do;
        put 'ERROR: Invalid range in data,' range=;
      end;

      return in_value;
    end;

    method run();
      declare int rc;
      declare double index;

      group_lookup.clear();

      do index = 1 to CONSTANT('BIG') until(last.style);
        set have(locktable=share);
        by style;
        group_lookup.add([range],["int"]);  * key -> value;
      end;

      do index = 1 to index;
        set have;
        "int" = findReplacement(range, "int");
        output;
      end;
    end;

  enddata;
run;
quit;

%let syslast = want;

Upvotes: 1

Related Questions