user154886
user154886

Reputation: 1

SAS: Looping through Logistic Regression Score table, calculating profit with cut-off, and output cut-off along with profit into seperate table

So to begin, I have completed a binary logistic regression model, and output several tables. I have a 'scored' data set that contains the actual default of a customer (GOODBAD) which is binary. I then have a predicted probability of default (p_1) that ranges from 0 to 1. I then must decide a cut-off value, generate a new variable that is a predicted-default that is now binary.

What I'm attempting to do is loop through potential cut-off values (lets say from .1 to .5 by a step of .1), and then calculate 'profit' from each of these 5 steps. I then want both the cut-off value and the 'profit' value in a separate data set to generate a graph of this relationship so that I may find my maximal profit.

Below is the code I currently have for generating a specific cut-off and it's associated profit. (the proc report shouldn't change at all, as these are pre-determined values for accounts/situations)

%MACRO PROFIT;

%DO I=1 %TO 5;

DATA TEST&i;
SET TRANS.SCORE;
IF P_1 >= .&i THEN preds = 1;
ELSE preds = 0;
RUN;

Data probs&i;
format outcometype $6.;
Set TEST&i (keep=preds goodbad crelim);
crelim2 = crelim/2;
if (preds=1 and goodbad=0) then do;
 outcometype="error2";
 profit =0;
end;
else if (preds=0 and goodbad=1) then do;
 outcometype ="error1";
 profit =-crelim2;
end;
else if (preds=1 and goodbad=1) then do;
 outcometype="valid1";
 profit=0;
end;
else do;
 outcometype="valid2";
 profit=250;
end;
run;

PROC REPORT DATA= probs&i nowd out=table&i;
COLUMN outcometype pct n profit pper1000;
DEFINE outcometype /group width = 8 ;
DEFINE profit /format=dollar15.2;
define pper1000 / computed format=dollar15.2;
/*get the overall number of obs*/
compute before;
overall=n;
endcomp;
compute pper1000;
pper1000 = (profit.sum/n)*1000;
endcomp;
compute before outcometype;
totaln=n;
endcomp;
compute pct;
pct = (totaln/overall);
if _break_ = '_RBREAK_' then pct= (overall/overall);
endcomp;
rbreak after/summarize dol;
RUN;
quit;

Data out;
set table&i;
CALL symput('profitAT', PUT(pper1000));
run;

Data new;
CutOFF = .&i;
profit = &profitAT;
run;

%END;

*proc print data = new; Run;

%MEND PROFIT;
%PROFIT;

The problem I'm having and I cannot seem to resolve is that I am over-writing the previous value with the most current value of both i and pper100 (or the macro variable profitAT).

I dont know how to force this to be seen as observations, if I should be looping through my macro variables, if, since I am placing them in the data step, I should have a separate loop that will check N and then write the observations as N increases, or yet further if there's an alternative I have yet to discover.

Upvotes: 0

Views: 187

Answers (1)

Tom
Tom

Reputation: 51581

Let's ignore the logic that is generating the dataset TABLE&i and the variable PPER1000 and concentrate on the looping aspects. You can use PROC APPEND as a method to aggregate values in your loop.

%macro profit ;
%if %sysfunc(exist(new)) %then %do;
* Remove existing NEW dataset on first pass;
proc delete data=new;
run;
%end;

%do i=1 %to 5;
* ... generate TABLE&i ... ;

* Get value of PPER1000 from last observation of TABLE&i ;
data add;
  set table&i end=eof ;
  if eof;
  cutoff = .&i;
  profit = pper1000;
  keep cutoff profit;
run;
* Accumulate results in NEW dataset. ;
proc append base=new data=add force ;
run;

%end;
proc print data = new; run;
%mend profit ;
%profit;

Upvotes: 0

Related Questions