Unbalanced model, confused as to what steps to take

Question

This is my first data mining project. I am using SAS Enterprise miner to train and test a classifier.

I have 3 files at my disposal,

Training file : 85 input variables and 1 target variable, with 5800+ observations
Prediction file : 85 input variables with 4000 observations
Verification file : 1 variable containing the correct predictions for the second file. Since this is an academic project, this file is here to tell us if we are doing a good job or not.

My problem is that the dataset is unbalanced (95% of 0s and 5% of 1s for the target variable in the training file). So naturally, I tried to re-sample the model using the "sampling node" as described in the following link

Here are the 2 approaches I used, they give slightly different results. But here is the general unsatisfactory result I am getting:

Without resampling : The model predicts less than ten solicited individuals (target variable = 1) over 4000 observations
- With the resampling : The model predicts about 1500 solicited individuals over 4000 observations.

I am looking for 100 to 200 solicited individuals to have a model that would be considered acceptable.

Why do you think our predictions are way off this way, and how can we remedy to this situation?

Here is a screen shot of both models

Unbalanced model, confused as to what steps to take

Answers (1)

Related Questions