Reputation: 273
I am working on a big data analysis project and i am stuck at this point I am trying to upload a CSV file with data and want to use WEKA java API to perform the analysis. I am looking to tokenize the text, remove stop words, identify pos and filter the nouns I have no idea why I am seeing this error. Explanation and Solution for this would be great ! But i see the below error
Error:
Exception in thread "main" java.io.IOException: wrong number of values. Read 21, expected 20, read Token[EOL], line 3
at weka.core.converters.ConverterUtils.errms(ConverterUtils.java:912)
at weka.core.converters.CSVLoader.getInstance(CSVLoader.java:819)
at weka.core.converters.CSVLoader.getDataSet(CSVLoader.java:642)
Code :
CSVLoader loader = new CSVLoader();
loader.setSource(new File("C:\\fakepath\\CSVfilesample.csv"));
Instances data = loader.getDataSet();
// save ARFF
ArffSaver saver = new ArffSaver();
saver.setInstances(data);
saver.setFile(new File("C:\\fakepath\\CSVfilesample.arff"));
saver.setDestination(new File("C:\\fakepath\\CSVfilesample.arff"));
saver.writeBatch();
BufferedReader br=null;
br=new BufferedReader(new FileReader("C:\\fakepath\\CSVfilesample.arff"));
Instances train=new Instances(br);
train.setClassIndex(train.numAttributes()-1);
br.close();
NaiveBayes nb=new NaiveBayes();
nb.buildClassifier(train);
Evaluation eval=new Evaluation(train);
eval.crossValidateModel(nb, train, 10, new Random(1));
System.out.println(eval.toSummaryString("\nResults\n=====\n",true));
System.out.println(eval.fMeasure(1)+" "+eval.precision(1)+" "+eval.recall(1));
Upvotes: 1
Views: 10551
Reputation: 4310
This error is generally caused by incorrect format while loading a certain ARFF
file. There a few reasons. Check the following points:
ARFF
format instead of a CSV because it has certain advantages over a CSV file. Check Can I use CSV.?%2
or something like that. Check for syntactically incorrect endings. Check for any extra commas. This error tells you that there is problem with the file contents. They don't follow WEKA standard format. Fix that and the error will disappear.
Hope it helps. :)
Upvotes: 7