Reputation: 23
I am trying to run 5-fold cross-validation on WEKA using a FilteredClassifier with SMOTE.
To my knowledge, I should apply SMOTE in each of the CV folds to obtain my CV error.
Does anyone have documentation or background on how WEKA performs CV in a FilteredClassifier using
Evaluation().crossvalidate_model(INPUTS)
I am using python with the weka-wrapper.
Thank you!
Upvotes: 1
Views: 485
Reputation: 2608
Weka treats the FilteredClassifier
meta-classifier just like any other classifier (since they both implement the weka.classifiers.Classifier
interface).
If you're performing 5-fold CV, then the data gets split into 5 pairs of train/test folds and each time the classifier gets trained with the training fold and then evaluated on the test fold. The weka.classifiers.Evaluation
class records the statistics obtained from the test data of each of the folds.
In your case (for each train/test fold), the FilteredClassifier
uses the training data to initialize the SMOTE
filter and filter it before building the base-classifier with it.
So the answer is yes, your SMOTE
filter gets initialized and applied in each of the CV folds.
The official place for Weka questions is the Weka mailing list.
Upvotes: 0