Reputation: 37
My original dataset has a binary dependent variable with 3% of the values being one. First, I split the original dataset into train and test (80-20 split). Since it has both numeric and binary independent variables, I am using SMOTENC on train to create a balanced train dataset. I generate a logistic regression model on the balanced train dataset, and use F-Measure as the metric to determine optimal cutoff. But now, what cutoff do I use on the test dataset? Since it is unbalanced, using the optimal cutoff found from the balanced train dataset is disastrous.
Upvotes: 0
Views: 42