CraigS
CraigS

Reputation: 37

Using SMOTE Train Model and Optimal Cutoff on Unbalanced Test Data in R

My original dataset has a binary dependent variable with 3% of the values being one. First, I split the original dataset into train and test (80-20 split). Since it has both numeric and binary independent variables, I am using SMOTENC on train to create a balanced train dataset. I generate a logistic regression model on the balanced train dataset, and use F-Measure as the metric to determine optimal cutoff. But now, what cutoff do I use on the test dataset? Since it is unbalanced, using the optimal cutoff found from the balanced train dataset is disastrous.

Upvotes: 0

Views: 42

Answers (0)

Related Questions