Reputation: 1990
I am trying to use weka to classify text. What I do is this:
all_of_it.arff
.train.arff
and test.arff
train_fs.arff
And the problem is.....
I don't quite know how to standardize the test set to only use the features I selected from the training set. Something like create new test file from test.arff
according to train_fs.arff
*I tried using
java -cp weka.jar weka.filters.unsupervised.attribute.Standardize -b -i train_fs.arff -o train2.arff -r test.arff -s test2.arff
but I got the infamous Src and Dest differ in # of attributes
.
Is there any way to normalize/standardize the sets according to an arff file (namely my new training data with few features) I don't see how to do this with the Standardize or StringToWordVector filter.
Upvotes: 1
Views: 943
Reputation: 7879
You may also want to look into InputMappedClassifier. It is a wrapper classifier that addresses incompatible training and testing data.
Upvotes: 1
Reputation: 2811
Batch filtering is one solution to your problem.
Pros:
Cons:
You can read more about Batch filtering here.
Upvotes: 1