Reputation: 1
Alright, here's the table I have (much bigger) and with different styles (a,B,C,D...).
+-------+-------+---------+
| Style | Range | Int |
+-------+-------+---------+
| A | 0-1 | . |
| A | 1-5 | . |
| A | 5-11 | . |
| A | 11-12 | . |
| A | 12-24 | -999999 |
| A | 24-36 | 0 |
| A | 36-48 | 0 |
| A | 48-60 | -999999 |
| A | 60+ | 0 |
+-------+-------+---------+
I want to change every single -999999 values based on a sequence that goes like this:
if(int = -999999) and range ='12-24'
then get the first non null value in int from
11-12, 5-11, 1-5, 0-1 24-36, 36-48, 48-60, 60+ in that order
if(int = -999999) and range ='0-1' or '1-5' or '5-11' or '11-12'
then get the first non null value in int from
12-24, 11-12, 5-11, 1-5, 0-1 24-36, 36-48, 48-60, 60+ in that order without looking up self
if(int = -999999) and range ='24-36'
then get the first non null value in int from
36-48, 48-60, 60+, 12-24, 11-12, 5-11, 1-5, 0-1 in that order
if(int = -999999) and range ='36-48' or '48-60' or '60+'
then get the first non null value in int from
24-36, 36-48, 48-60, 60+, 12-24, 11-12, 5-11, 1-5, 0-1 in that order without looking up self
This would get me this table:
+-------+-------+---------+
| Style | Range | Int |
+-------+-------+---------+
| A | 0-1 | . |
| A | 1-5 | . |
| A | 5-11 | . |
| A | 11-12 | . |
| A | 12-24 | 0 |
| A | 24-36 | 0 |
| A | 36-48 | 0 |
| A | 48-60 | 0 |
| A | 60+ | 0 |
+-------+-------+---------+
Hope this is clear. Is there an easier approach to do this?
Thanks
Upvotes: 0
Views: 53
Reputation: 27508
For each style processed you will have to store all the int
values in an array (or hash) and perform your peculiar backwards then forwards rules centrized on 12-24
.
Q: Suppose your sample data had 0-1
-999999
. The rules would pick out -999999
from range 12-24
. Do you search once until non-missing
, or multiple until non-missing
and non-<special-value>
. Perhaps you think such data will not occur, but data don't care what you think :)
Presume every style has all nine ranges. Serial DOW looping will let you array the groups int
s as you go down the items of group, and then process each item according to the lookup rules. The rules can be encoded in a delta table that indicates where the next lookup value is.
data have; infile datalines dlm='|';
input
Style$ Range$ Int; datalines;
| A | 0-1 | . |
| A | 1-5 | . |
| A | 5-11 | . |
| A | 11-12 | . |
| A | 12-24 | -999999 |
| A | 24-36 | 0 |
| A | 36-48 | 0 |
| A | 48-60 | -999999 |
| A | 60+ | 0 |
run;
data want;
array ints(9) int_1 - int_9;
do _n_ = 1 by 1 until (last.style);
set have;
by style;
ints(_n_) = int;
end;
do _n_ = 1 to _n_;
set have;
if int = -999999 then do;
* traversal data for look up rules;
array rule[9,9] _temporary_ (
/* 0-1 */ +4 +3 +2 +1 +0 +5 +6 +7 +8
/* 1-5 */ +3 +2 +1 +0 -1 +4 +5 +6 +7
/* 5-11 */ +2 +1 +0 -1 -2 +3 +4 +5 +6
/* 11-12 */ +1 +0 -1 -2 -3 +2 +3 +4 +5
/* 12-24 */ -1 -2 -3 -4 +0 +1 +2 +3 +4
/* 24-36 */ +1 +2 +3 +0 -1 -2 -3 -4 -5
/* 36-48 */ -1 +0 +1 +2 -2 -3 -4 -5 -6
/* 48-60 */ -2 -1 +0 +1 -3 -4 -5 -6 -7
/* 60+ */ -3 -2 -1 +0 -4 -5 -6 -7 -8
);
do _m_ = 1 to 9 while (int = -999999);
step = rule(_n_,_m_);
if ints(_n_ + step) not in (., -999999) then
int = ints(_n_ + step);
end;
end;
output;
end;
drop int_: _m_ step;
run;
@Whymath asks how the rule
array is made.
The coding presumes every group has every range, and the ranges are sorted in the order stated in the question.
Array
Range Index
----- -----
0-1 1
1-5 2
5-11 3
11-12 4
12-24 5
24-36 6
36-48 7
48-60 8
60+ 9
The range array index is used in the first dimension of the rules
array.
So, what is done when -999999 found for range 0-1 ? The second if tells you
if(int = -999999) and range ='0-1' or '1-5' or '5-11' or '11-12' then get the first non null value in int from 12-24, 11-12, 5-11, 1-5, 0-1 24-36, 36-48, 48-60, 60+ in that order
The first DO
loop fills the ints
array (of 9 values) with the int
values of the group (presuming to have a value for each range).
So when -999999 found @ 0-1 the if says lookup for non-null, non -999999 in range lookup order:
0-1 range lookup index delta
index lookup index [0-1] index to lookup index
----- ------ ------ --------
1 12-24 5 +4
1 11-12 4 +3
1 5-11 3 +2
1 1-5 2 +1
1 0-1 1 0
1 24-36 6 +5
1 36-48 7 +6
1 48-60 8 +7
1 60+ 9 +8
Looking down the index delta, those values become the steps to apply when searching for non-null non-999999. The index delta is the second dimension in the rules
table so the values go across in the 2-D array initialization.
Repeat the logic conversion to 'steps' for the 8 other ranges and you get the 9x9 rules
matrix.
If the ranges are not necessarily complete, the values in each group and the searching rules can be done with using hashes instead of arrays. Here is an example of using Proc DS2
and a multidata hash for rules.
* implement loop over group_lookup[logic_lookup[range]];
* the inner lookup result becomes the key for the outer lookup;
proc ds2;
data want(overwrite=yes keep=(style range "int"));
declare char(8) style;
declare char(8) range key;
declare int "int" value;
declare package hash group_lookup([key], [value]);
declare package hash logic_lookup();
method init();
declare char(8) keys[9];
declare int index;
logic_lookup.keys([range]);
logic_lookup.data([key]);
logic_lookup.multiData('yes');
logic_lookup.defineDone();
keys := ('12-24' '11-12' '5-11' '1-5' '0-1' '24-36' '36-48' '48-60' '60+');
range = '0-1'; do index = 1 to dim(keys); key = keys[index]; if key ne range then logic_lookup.add(); end;
range = '1-5'; do index = 1 to dim(keys); key = keys[index]; if key ne range then logic_lookup.add(); end;
range = '5-11'; do index = 1 to dim(keys); key = keys[index]; if key ne range then logic_lookup.add(); end;
range = '11-12'; do index = 1 to dim(keys); key = keys[index]; if key ne range then logic_lookup.add(); end;
range = '12-24'; do index = 1 to dim(keys); key = keys[index]; if key ne range then logic_lookup.add(); end;
keys := ('24-36' '36-48' '48-60' '60+' '12-24' '11-12' '5-11' '1-5' '0-1' '24-36');
range = '24-36'; do index = 1 to dim(keys); key = keys[index]; if key ne range then logic_lookup.add(); end;
range = '36-48'; do index = 1 to dim(keys); key = keys[index]; if key ne range then logic_lookup.add(); end;
range = '48-60'; do index = 1 to dim(keys); key = keys[index]; if key ne range then logic_lookup.add(); end;
range = '60+'; do index = 1 to dim(keys); key = keys[index]; if key ne range then logic_lookup.add(); end;
end;
method findReplacement(char(10) range, int in_value) returns int;
declare int result rc;
if in_value not in (. -999999) then return in_value;
if logic_lookup.find() eq 0 then do; * hash host variables [range] and [key] are global, do not mask them with local variables of the same name;
if group_lookup.find() eq 0 then do; * hash host variables [keys] and [value] are global, do not mask them with local variables of the same name;
if value not in (. -999999) then return value;
do while (logic_lookup.has_next() = 0); * search the other rule ranges listed in the multidata;
if logic_lookup.find_next() = 0
& group_lookup.find() = 0
& value not in (. -999999) then return value;
end;
end;
end;
else do;
put 'ERROR: Invalid range in data,' range=;
end;
return in_value;
end;
method run();
declare int rc;
declare double index;
group_lookup.clear();
do index = 1 to CONSTANT('BIG') until(last.style);
set have(locktable=share);
by style;
group_lookup.add([range],["int"]); * key -> value;
end;
do index = 1 to index;
set have;
"int" = findReplacement(range, "int");
output;
end;
end;
enddata;
run;
quit;
%let syslast = want;
Upvotes: 1