SPSS: Calculating the Information Value for several variables within a Macro

Question

The first step in the process of developing a credit risk scorecard is to assess which variables are predictive.

To do this, the information value for the variable is calculated. In Excel it is quite simple: Information Value is the sum of iv*1000. iv=woe*(%Goods-%Bads) woe=ln(%Goods/%Bads)

So, in the example below my variables has 3 values (1,2,3). Based on the distribution shown, the information value works out at 22.738.

Attributes %Bads out of Total %Goods out of total woe iv 1 59% 66% 0.114653893 0.008 2 36% 30% -0.168842887 0.009 3 5% 4% -0.33749397 0.005 Total 100% 100% 22.738

In building a scorecard I will look at anywhere between 100 to 300 variables to find the most predictive ones. Therefore a macro is needed. Ideally I'd want to also automatically band continuous variables such as income into for example 10 equal bands so that the information value makes more sense.

I had some code in SAS to do the above (the automatic banding was done separately however). I started to convert it into SPSS language but I couldnt understand how to replace the CALL SYMPUT lines. I'm hoping someone can easily take a look at the SAS code and translate it into SPSS.

Here is the SAS code.

/* calculating total values for goods, bads and total */
/* the values are output to a data set and called later in a macro */

PROC MEANS DATA=test NOPRINT MAXDEC=4;              
VAR GOOD BAD ;                        

OUTPUT OUT=TOTALS SUM = GTC BTC ;              
RUN;

DATA TOTALS;                                              
SET TOTALS;                                              

TTC = SUM(OF GTC BTC);                                

CALL SYMPUT('GTC',GTC);                                   
CALL SYMPUT('BTC',BTC);                                   
CALL SYMPUT('TTC',TTC); 
run;

/* Calculation of information value */

%macro infov(var);

PROC SUMMARY DATA=test;                        
CLASS &var;                                        
VAR GOOD BAD ;                                        
OUTPUT OUT=RESULTS SUM=GC BC;                             
RUN;

DATA RESULTS;                                            
SET RESULTS;                                             
BY &var;                                               

IF _TYPE_=1;                                             

IF GC=. THEN GC=0;                                      
IF GC NE 0 THEN GP= GC/>C*100;                        
ELSE GP=0;                                              
GCP + GP;                                               

IF BC=. THEN BC=0;                                      
IF BC NE 0 THEN BP= BC/&BTC*100;             
ELSE BP=0;                                   
BCP + BP;                                    

format iv&var 5.2;                          
iv&var=0;
IV&var=(GP-BP)*log(GP/BP);

run; 

PROC MEANS DATA=results NOPRINT MAXDEC=4;              
VAR iv&var;                        
OUTPUT OUT=iv&var SUM = ivTC&var;              
RUN;

DATA IV&VAR (KEEP = &VAR); SET IV&VAR; 
RENAME IVTC&VAR=&VAR;
RUN;


%mend infov;
%infov(app_1_age);
%infov(app_1_employment_status);
%infov(app_1_marital_status);



DATA ALL; MERGE IV:;
RUN;

PROC TRANSPOSE DATA=ALL OUT=ALL;  RUN;
DATA ALL; SET ALL; RENAME COL1=iv _name_=Variable; RUN;
PROC SORT DATA=ALL;
BY DESCENDING iv;
RUN; 
PROC PRINT; 
title 'Information Value';RUN;

Thank you. Elisabeth

SPSS: Calculating the Information Value for several variables within a Macro

Answers (1)

Related Questions