Reputation: 1
So to begin, I have completed a binary logistic regression model, and output several tables. I have a 'scored' data set that contains the actual default of a customer (GOODBAD) which is binary. I then have a predicted probability of default (p_1) that ranges from 0 to 1. I then must decide a cut-off value, generate a new variable that is a predicted-default that is now binary.
What I'm attempting to do is loop through potential cut-off values (lets say from .1 to .5 by a step of .1), and then calculate 'profit' from each of these 5 steps. I then want both the cut-off value and the 'profit' value in a separate data set to generate a graph of this relationship so that I may find my maximal profit.
Below is the code I currently have for generating a specific cut-off and it's associated profit. (the proc report shouldn't change at all, as these are pre-determined values for accounts/situations)
%MACRO PROFIT;
%DO I=1 %TO 5;
DATA TEST&i;
SET TRANS.SCORE;
IF P_1 >= .&i THEN preds = 1;
ELSE preds = 0;
RUN;
Data probs&i;
format outcometype $6.;
Set TEST&i (keep=preds goodbad crelim);
crelim2 = crelim/2;
if (preds=1 and goodbad=0) then do;
outcometype="error2";
profit =0;
end;
else if (preds=0 and goodbad=1) then do;
outcometype ="error1";
profit =-crelim2;
end;
else if (preds=1 and goodbad=1) then do;
outcometype="valid1";
profit=0;
end;
else do;
outcometype="valid2";
profit=250;
end;
run;
PROC REPORT DATA= probs&i nowd out=table&i;
COLUMN outcometype pct n profit pper1000;
DEFINE outcometype /group width = 8 ;
DEFINE profit /format=dollar15.2;
define pper1000 / computed format=dollar15.2;
/*get the overall number of obs*/
compute before;
overall=n;
endcomp;
compute pper1000;
pper1000 = (profit.sum/n)*1000;
endcomp;
compute before outcometype;
totaln=n;
endcomp;
compute pct;
pct = (totaln/overall);
if _break_ = '_RBREAK_' then pct= (overall/overall);
endcomp;
rbreak after/summarize dol;
RUN;
quit;
Data out;
set table&i;
CALL symput('profitAT', PUT(pper1000));
run;
Data new;
CutOFF = .&i;
profit = &profitAT;
run;
%END;
*proc print data = new; Run;
%MEND PROFIT;
%PROFIT;
The problem I'm having and I cannot seem to resolve is that I am over-writing the previous value with the most current value of both i and pper100 (or the macro variable profitAT).
I dont know how to force this to be seen as observations, if I should be looping through my macro variables, if, since I am placing them in the data step, I should have a separate loop that will check N and then write the observations as N increases, or yet further if there's an alternative I have yet to discover.
Upvotes: 0
Views: 187
Reputation: 51581
Let's ignore the logic that is generating the dataset TABLE&i and the variable PPER1000 and concentrate on the looping aspects. You can use PROC APPEND as a method to aggregate values in your loop.
%macro profit ;
%if %sysfunc(exist(new)) %then %do;
* Remove existing NEW dataset on first pass;
proc delete data=new;
run;
%end;
%do i=1 %to 5;
* ... generate TABLE&i ... ;
* Get value of PPER1000 from last observation of TABLE&i ;
data add;
set table&i end=eof ;
if eof;
cutoff = .&i;
profit = pper1000;
keep cutoff profit;
run;
* Accumulate results in NEW dataset. ;
proc append base=new data=add force ;
run;
%end;
proc print data = new; run;
%mend profit ;
%profit;
Upvotes: 0