diverging results from weka training and java training

Question

I'm trying to create an "automated trainning" using weka's java api but I guess I'm doing something wrong, whenever I test my ARFF file via weka's interface using MultiLayerPerceptron with 10 Cross Validation or 66% Percentage Split I get some satisfactory results (around 90%), but when I try to test the same file via weka's API every test returns basically a 0% match (every row returns false)

here's the output from weka's gui:

=== Evaluation on test split === === Summary ===

Correctly Classified Instances          78               91.7647 %
Incorrectly Classified Instances         7                8.2353 %
Kappa statistic                          0.8081
Mean absolute error                      0.0817
Root mean squared error                  0.24  
Relative absolute error                 17.742  %
Root relative squared error             51.0603 %
Total Number of Instances               85

=== Detailed Accuracy By Class ===

                TP Rate   FP Rate   Precision   Recall  F-Measure   ROC Area  Class
                 0.885     0.068      0.852     0.885     0.868      0.958    1
                 0.932     0.115      0.948     0.932     0.94       0.958    0
Weighted Avg.    0.918     0.101      0.919     0.918     0.918      0.958

=== Confusion Matrix ===

  a  b   <-- classified as
 23  3 |  a = 1
  4 55 |  b = 0

and here's the code I've using on java (actually it's on .NET using IKVM):

var classifier = new weka.classifiers.functions.MultilayerPerceptron();
classifier.setOptions(weka.core.Utils.splitOptions("-L 0.7 -M 0.3 -N 75 -V 0 -S 0 -E 20 -H a")); //these are the same options (the default options) when the test is run under weka gui

string trainingFile = Properties.Settings.Default.WekaTrainingFile; //the path to the same file I use to test on weka explorer
weka.core.Instances data = null;
data = new weka.core.Instances(new java.io.BufferedReader(new java.io.FileReader(trainingFile))); //loads the file
data.setClassIndex(data.numAttributes() - 1); //set the last column as the class attribute

cl.buildClassifier(data);

var tmp = System.IO.Path.GetTempFileName(); //creates a temp file to create an arff file with a single row with the instance I want to test taken from the arff file loaded previously
using (var f = System.IO.File.CreateText(tmp))
{
    //long code to read data from db and regenerate the line, simulating data coming from the source I really want to test
}

var dataToTest = new weka.core.Instances(new java.io.BufferedReader(new java.io.FileReader(tmp)));
dataToTest.setClassIndex(dataToTest.numAttributes() - 1);

double prediction = 0;

for (int i = 0; i < dataToTest.numInstances(); i++)
{
    weka.core.Instance curr = dataToTest.instance(i);
    weka.core.Instance inst = new weka.core.Instance(data.numAttributes());
    inst.setDataset(data);
    for (int n = 0; n < data.numAttributes(); n++)
    {
        weka.core.Attribute att = dataToTest.attribute(data.attribute(n).name());
        if (att != null)
        {
            if (att.isNominal())
            {
                if ((data.attribute(n).numValues() > 0) && (att.numValues() > 0))
                {
                    String label = curr.stringValue(att);
                    int index = data.attribute(n).indexOfValue(label);
                    if (index != -1)
                        inst.setValue(n, index);
                }
            }
            else if (att.isNumeric())
            {
                inst.setValue(n, curr.value(att));
            }
            else
            {
                throw new InvalidOperationException("Unhandled attribute type!");
            }
        }
    }
    prediction += cl.classifyInstance(inst);
}

//prediction is always 0 here, my ARFF file has two classes: 0 and 1, 92 zeroes and 159 ones

it's funny because if I change the classifier to let's say NaiveBayes the results match the test made via weka's gui

kaz · Accepted Answer

You are using a deprecated way of reading in ARFF files. See this documentation. Try this instead:

 import weka.core.converters.ConverterUtils.DataSource;
 ...
 DataSource source = new DataSource("/some/where/data.arff");
 Instances data = source.getDataSet();

Note that that documentation also shows how to connect to a database directly, and bypass the creation of temporary ARFF files. You could, additionally, read from the database and manually create instances to populate the Instances object with.

Finally, if simply changing the classifier type at the top of the code to NaiveBayes solved the problem, then check the options in your weka gui for MultilayerPerceptron, to see if they are different from the defaults (different settings can cause the same classifier type to produce different results).

Update: it looks like you're using different test data in your code than in your weka GUI (from a database vs a fold of the original training file); it might also be the case that the particular data in your database actually does look like class 0 to the MLP classifier. To verify whether this is the case, you can use the weka interface to split your training arff into train/test sets, and then repeat the original experiment in your code. If the results are the same as the gui, there's a problem with your data. If the results are different, then we need to look more closely at the code. The function you would call is this (from the Doc):

public Instances trainCV(int numFolds, int numFold)

diverging results from weka training and java training

Answers (2)

Related Questions