rambo17
rambo17

Reputation: 1

Is there a way to modelize a partial predictor in a classification problem with an unbalanced target?

I would like to share with you a classification issue I faced during the modelling process. I have to create a model for an unbalanced binary target by 4 predictors where one of them has 45% of wrong values. This predictor must be in the model.


*** What I have in my data ?


*** Solutions with the pros/cons :

  1. A model with the remediate variable (VarD) and others after dropping the 45% wrongs of RVarD in the dataset. So we will have 5500 observations - target (yes - 20 / No - 5480)
  1. Find a way to impute the 45% wrongs of the new remediate variable RVarD based on the distribution of the 55% corrected. I can also discretize and assign the category to the 45% wrongs based on the 55% right.
  1. 1 model without the new remediate variable (VarD) plus use the coefficients for predictions(probs). A second model with only the VarD for the 55% observations right. Compare these two probs and find a scaling factor to link the two models.
  1. As the 2/, modelized a first model without the remediate variable RVarD and use the coefficient for prediction first. Then, find a way to use the mandatory variable RVarD by business rules or additional layer.

Which one is more realistic or how could I improve it ? Feel free to propose different approach, I am open for discussion.

Thanks a lot.

Upvotes: 0

Views: 29

Answers (0)

Related Questions