Reputation: 33
The first step in the process of developing a credit risk scorecard is to assess which variables are predictive.
To do this, the information value for the variable is calculated. In Excel it is quite simple: Information Value is the sum of iv*1000. iv=woe*(%Goods-%Bads) woe=ln(%Goods/%Bads)
So, in the example below my variables has 3 values (1,2,3). Based on the distribution shown, the information value works out at 22.738.
Attributes %Bads out of Total %Goods out of total woe iv 1 59% 66% 0.114653893 0.008 2 36% 30% -0.168842887 0.009 3 5% 4% -0.33749397 0.005 Total 100% 100% 22.738
In building a scorecard I will look at anywhere between 100 to 300 variables to find the most predictive ones. Therefore a macro is needed. Ideally I'd want to also automatically band continuous variables such as income into for example 10 equal bands so that the information value makes more sense.
I had some code in SAS to do the above (the automatic banding was done separately however). I started to convert it into SPSS language but I couldnt understand how to replace the CALL SYMPUT lines. I'm hoping someone can easily take a look at the SAS code and translate it into SPSS.
Here is the SAS code.
/* calculating total values for goods, bads and total */
/* the values are output to a data set and called later in a macro */
PROC MEANS DATA=test NOPRINT MAXDEC=4;
VAR GOOD BAD ;
OUTPUT OUT=TOTALS SUM = GTC BTC ;
RUN;
DATA TOTALS;
SET TOTALS;
TTC = SUM(OF GTC BTC);
CALL SYMPUT('GTC',GTC);
CALL SYMPUT('BTC',BTC);
CALL SYMPUT('TTC',TTC);
run;
/* Calculation of information value */
%macro infov(var);
PROC SUMMARY DATA=test;
CLASS &var;
VAR GOOD BAD ;
OUTPUT OUT=RESULTS SUM=GC BC;
RUN;
DATA RESULTS;
SET RESULTS;
BY &var;
IF _TYPE_=1;
IF GC=. THEN GC=0;
IF GC NE 0 THEN GP= GC/>C*100;
ELSE GP=0;
GCP + GP;
IF BC=. THEN BC=0;
IF BC NE 0 THEN BP= BC/&BTC*100;
ELSE BP=0;
BCP + BP;
format iv&var 5.2;
iv&var=0;
IV&var=(GP-BP)*log(GP/BP);
run;
PROC MEANS DATA=results NOPRINT MAXDEC=4;
VAR iv&var;
OUTPUT OUT=iv&var SUM = ivTC&var;
RUN;
DATA IV&VAR (KEEP = &VAR); SET IV&VAR;
RENAME IVTC&VAR=&VAR;
RUN;
%mend infov;
%infov(app_1_age);
%infov(app_1_employment_status);
%infov(app_1_marital_status);
DATA ALL; MERGE IV:;
RUN;
PROC TRANSPOSE DATA=ALL OUT=ALL; RUN;
DATA ALL; SET ALL; RENAME COL1=iv _name_=Variable; RUN;
PROC SORT DATA=ALL;
BY DESCENDING iv;
RUN;
PROC PRINT;
title 'Information Value';RUN;
Thank you. Elisabeth
Upvotes: 1
Views: 2566
Reputation: 5417
The similar basic feature in SPSS Statistics would be the DEFINE command to create a macro definition, but you would get much more flexibility and power by using Python programmability. The Python Essentials are a free download via the SPSS Community website (www.ibm.com/developerworks/spssdevcentral) or, for Statistics V21, from the site where you download Statistics itself.
From that same site you can download the Programming and Data Management book from the Books and Articles section. It shows a lot of examples of using this technology in the SPSS context. The full api documentation is available from the IBM Info Center, which is bookmarked on the site.
Python programmability can be used, even, to create macros so that later code can use the built-in macro facility.
HTH, Jon Peck
Upvotes: 1