Reputation: 113
I started trying out the Weka GUI application to learn how I want to build my text classifier and I successfully built and saved a model using the GUI.
Now, I want to implement the classifier in Java code. But I can't seem to set the stopwords and tokenizer settings of the StringToWordVector filter in code like I did in the GUI. (See the screenshot.)
(Of course, without the stopwords handler set to NULL.)
I am aware that I can load the model I created and saved from the GUI, into the code. But I need to implement the filter in Java.
I tried to use the code here: Different results in Weka GUI and Weka via Java code Mainly, this part (of course, after changing the path):
String opt = "-W -P 0 -M 5.0 -norm 1.0 -lnorm 2.0 -lowercase -stoplist - stopwords C:\\Users\\Fernando\\workspace\\GPCommentsAnalyzer\\pt-br_stopwords.dat -tokenizer \"weka.core.tokenizers.NGramTokenizer -delimiters ' \\r\\n\\t.,;:\\\'\\\"()?!\' -max 2 -min 1\" -stemmer weka.core.stemmers.NullStemmer";
But, it still doesn't work.
I can't find any documentation about this topic anywhere. Any help would be much appreciated!
(I am using Weka version 3.7.12)
Upvotes: 1
Views: 1316
Reputation: 14701
Set your configuration using GUI, then use copy configuration to clipboard option in context menu.
Upvotes: 1