Reputation: 21
I'm using Weka for a sentiment analysis project i'm working on. I'm using Weka CSV Loader to load the training instances from a csv file, but for some reason if i want to load more than 70 instances, the program gives me an "java.lang.ArrayIndexOutOfBoundsException: 2" exception. I found that u can give options to Weka CSV Loader
-B The size of the in memory buffer (in rows). (default: 100)
this one beeing maybe the one i need to set, to get rid of this error, but i'm not sure how to do this from a Java project. If anyone can help me with this, i would appreciate it greatly
UPDATE: The buffer size change didn't help the problems comes from somewhere else
How i'm using the loader:
private void getTrainingDataset(final String INPUT_FILENAME)
{
try{
//reading the training dataset from CSV file
CSVLoader trainingLoader =new CSVLoader();
trainingLoader.setSource(new File(INPUT_FILENAME));
inputDataset = trainingLoader.getDataSet();
}catch(IOException ex)
{
System.out.println("Exception in getTrainingDataset Method");
}
}
UPDATE: for those who want to know where the exception occurs
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 2
at weka.core.converters.CSVLoader.getInstance(CSVLoader.java:1251)
at weka.core.converters.CSVLoader.readData(CSVLoader.java:866)
at weka.core.converters.CSVLoader.readHeader(CSVLoader.java:1150)
at weka.core.converters.CSVLoader.getStructure(CSVLoader.java:924)
at weka.core.converters.CSVLoader.getDataSet(CSVLoader.java:836)
at sentimentanalysis.SentimentAnalysis.getTrainingDataset(SentimentAnalysis.java:209)
at sentimentanalysis.SentimentAnalysis.trainClassifier(SentimentAnalysis.java:134)
at sentimentanalysis.SentimentAnalysis.main(SentimentAnalysis.java:282)
UPDATE: Even for under 70 instances, after a few, the Classifier also gives an error. Everything works fine for like 10-20 instances but it all goes to shit for more :)
Upvotes: 1
Views: 1466
Reputation: 21
Weka read CSV two times, first pass limited to buffersize (in rows) to extract classes of nominal attributes, the second pass read the entire file. the classes of each nominal attribute much match the classes of the training set (no more, no less). increase the value of the buffersize to more than the number of rows if still an error occurs then look for a class that it is not in the both files.
Upvotes: 2