Reputation: 127
data scores;
length variables $ 16;
input variables $ low high score;
datalines;
Debt -10000 1 55
Debt 1 10000 23
MAX_NA -1 1 500
MAX_NA 1 100 -240
;
data main_data;
input ID Debt MAX_NA;
datalines;
222554 7584 12
212552 20 0
883123 500 7
913464 -200 -78
;
data end_result;
input ID Debt MAX_NA score;
datalines;
222554 7584 12 -217
212552 20 0 523
883123 500 7 -185
913464 -200 -78 555
;
Above you'll find three data sets.
The scores data sets depict each variables' score, based on a range of values between low and high columns.
The second data set main_data shows the exact values of Debt and MAX_NA.
end_result table is what I would like to achieve.
What step and statements should I use to calculate the score and get the end_result table?
Upvotes: 1
Views: 235
Reputation: 4937
Simpler:
data end_result(keep=ID Debt MAX_NA score);
set main_data;
score = 0;
do i = 1 to n;
set scores(rename=score=s) point=i nobs=n;
if variables = "Debt" and low <= Debt <= high then score + s;
else if variables = "MAX_NA" and low <= MAX_NA <= high then score + s;
end;
run;
Upvotes: 2
Reputation: 324
Another apprach is to use a double left join like so:
data scores;
length variables $ 16;
input variables $ low high score;
datalines;
Debt -10000 1 55
Debt 1 10000 23
MAX_NA -1 1 500
MAX_NA 1 100 -240
;
data main_data;
input ID Debt MAX_NA;
sortseq = _n_;
datalines;
222554 7584 12
212552 20 0
883123 500 7
913464 -200 -78
;
proc sql;
create table end_result as
select a.ID
,a.Debt
,a.MAX_NA
,coalesce(b.score,0) + coalesce(c.score,0) as score
from main_data as a
left join scores(where=(variables="Debt")) as b
on b.low < a.Debt <= b.high
left join scores(where=(variables="MAX_NA")) as c
on c.low < a.MAX_NA <= c.high
order by a.sortseq
;
quit;
Note that I have included a sortseq variable in main_data to keep the sorting order. Like draycut I get the same score for id 222554 and 883123. For ID 913464 the MAX_NA value is out of range of the scores dataset, so I have counted it as zero by using the coalesce funtion. I therefore get the results:
ID Debt MAX_NA score
222554 7584 12 -217
212552 20 0 523
883123 500 7 -217
913464 -200 -78 55
Upvotes: 2
Reputation: 4937
I don't understand why id 222554 and 883123 do not get the same score?
Anyway, here is an approach you can use as a template.
data end_result;
if _N_ = 1 then do;
dcl hash h(dataset : "scores(rename=score=s)", multidata : "Y");
h.definekey("variables");
h.definedata(all : "Y");
h.definedone();
dcl hiter hi("h");
end;
set main_data;
if 0 then set scores(rename=score=s);
score = 0;
do while (hi.next() = 0);
if variables = "Debt" and low <= Debt <= high then score + s;
else if variables = "MAX_NA" and low <= MAX_NA <= high then score + s;
end;
keep id Debt max_na score;
run;
Result:
ID Debt MAX_NA score
222554 7584 12 -217
212552 20 0 523
883123 500 7 -217
913464 -200 -78 555
Upvotes: 1