Reputation: 3076
I want to run n fold cross validation on some classifiers I am using. I found example code on the WEKA Wiki (here is the WekaDemo.java) but this applies a filter before running the validation. Does this always need to be done or is this not required?
Here is the section of code:
/**
* runs 10fold CV over the training file
*/
public void execute() throws Exception {
// run filter
m_Filter.setInputFormat(m_Training);
Instances filtered = Filter.useFilter(m_Training, m_Filter);
// train classifier on complete file for tree
m_Classifier.buildClassifier(filtered);
// 10fold CV with seed=1
m_Evaluation = new Evaluation(filtered);
m_Evaluation.crossValidateModel(
m_Classifier, filtered, 10, m_Training.getRandomNumberGenerator(1));
}
Also is this an acceptable way of evaluating performance of a classifier?
Upvotes: 0
Views: 556
Reputation: 189
I'd consider this bad practise. If the filter depends on/uses class information, then the cross-validation estimate will be (potentially very) optimistically biased, and therefore probably useless. For an extreme example think about adding a copy of the class-attribute to the data. In almost all cases you will be better off and safer if you use weka.classifiers.meta.FilteredClassifier there is an example on how to use it on the same Wiki page you cite.
cheers, Bernhard
Upvotes: 2