Imad
Imad

Reputation: 2741

Unbalanced model, confused as to what steps to take

This is my first data mining project. I am using SAS Enterprise miner to train and test a classifier.

I have 3 files at my disposal,

  1. Training file : 85 input variables and 1 target variable, with 5800+ observations
  2. Prediction file : 85 input variables with 4000 observations
  3. Verification file : 1 variable containing the correct predictions for the second file. Since this is an academic project, this file is here to tell us if we are doing a good job or not.

My problem is that the dataset is unbalanced (95% of 0s and 5% of 1s for the target variable in the training file). So naturally, I tried to re-sample the model using the "sampling node" as described in the following link

Here are the 2 approaches I used, they give slightly different results. But here is the general unsatisfactory result I am getting:

I am looking for 100 to 200 solicited individuals to have a model that would be considered acceptable.

Why do you think our predictions are way off this way, and how can we remedy to this situation?

Here is a screen shot of both models

Model 1 Model 2

Upvotes: 2

Views: 732

Answers (1)

Masoud
Masoud

Reputation: 1351

There are some Technics to deal with unbalanced data. One that I remember many years ago was this approach:

  1. say you have 100 observation solicited(minority) that are 5% of all your observations
  2. cluster other none solicited(maturity) class, to 20 groups(each of with have 100 observation of none solicited individuals) with clustering algorithms like KMEAN, MEANSHIF, DBSCAN and...
  3. then for each group of maturity clustered observation, create a dataset with all 100 observation solicited(minority) class. It means that you have 20 group of dataset each of witch is balanced with 100 solicited and 100 none solicited observations
  4. train each balanced group and create a model for each of them
  5. at prediction, predict all 20 models. for example if 15 out of 20 models say it is solicited, it is solicited

Upvotes: 1

Related Questions