Buffalo
Buffalo

Reputation: 4042

Weka output predictions

I've used the Weka GUI for training and testing a file (making predictions), but can't do the same with the API. The error I'm getting says there's a different number of attributes in the train and test files. In the GUI, this can be solved by checking "Output predictions".

How to do something similar using the API? do you know of any samples out there?

import weka.classifiers.bayes.NaiveBayes;
import weka.classifiers.meta.FilteredClassifier;
import weka.classifiers.trees.J48;
import weka.core.Instances;
import weka.core.converters.ConverterUtils.DataSource;
import weka.filters.Filter;
import weka.filters.unsupervised.attribute.NominalToBinary;
import weka.filters.unsupervised.attribute.Remove;

public class WekaTutorial
{

  public static void main(String[] args) throws Exception
  {
    DataSource trainSource = new DataSource("/tmp/classes - edited.arff"); // training
    Instances trainData = trainSource.getDataSet();

    DataSource testSource = new DataSource("/tmp/classes_testing.arff");
    Instances testData = testSource.getDataSet();

    if (trainData.classIndex() == -1)
    {
      trainData.setClassIndex(trainData.numAttributes() - 1);
    }

    if (testData.classIndex() == -1)
    {
      testData.setClassIndex(testData.numAttributes() - 1);
    }    

    String[] options = weka.core.Utils.splitOptions("weka.filters.unsupervised.attribute.StringToWordVector -R first-last -W 1000 -prune-rate -1.0 -N 0 -stemmer weka.core.stemmers.NullStemmer -M 1 "
            + "-tokenizer \"weka.core.tokenizers.WordTokenizer -delimiters \" \\r\\n\\t.,;:\\\'\\\"()?!\"");

    Remove remove = new Remove();
    remove.setOptions(options);
    remove.setInputFormat(trainData);

    NominalToBinary filter = new NominalToBinary(); 

    NaiveBayes nb = new NaiveBayes();

    FilteredClassifier fc = new FilteredClassifier();
    fc.setFilter(filter);
    fc.setClassifier(nb);
    // train and make predictions
    fc.buildClassifier(trainData);

    for (int i = 0; i < testData.numInstances(); i++)
    {
      double pred = fc.classifyInstance(testData.instance(i));
      System.out.print("ID: " + testData.instance(i).value(0));
      System.out.print(", actual: " + testData.classAttribute().value((int) testData.instance(i).classValue()));
      System.out.println(", predicted: " + testData.classAttribute().value((int) pred));
    }

  }

}

Error:
Exception in thread "main" java.lang.IllegalArgumentException: Src and Dest differ in # of attributes: 2 != 17152

This was not an issue for the GUI.

enter image description here

Upvotes: 1

Views: 1320

Answers (1)

Drahoš Maďar
Drahoš Maďar

Reputation: 567

You need to ensure that categories in train and test sets are compatible, try to

  • combine train and test sets
  • List item
  • preprocess them
  • save them as arff
  • open two empty files
  • copy the header from the top to line "@data"
  • copy in training set into first file and test set into second file

Upvotes: 2

Related Questions