Jose Ramon
Jose Ramon

Reputation: 5444

Evaluate the class of a sample using WEKA

I have created a model in Weka using the SMO algorithm. I am trying to evaluate a test sample using the mentioned model to classify it in my two-class problem. I am a bit confused on how to evaluate the sample using Weka Smo code. I have built an empty arff file which contains only the meta-data of the file. I calculate the sample features and I add the vector in arff file. I have created the following function Evaluate in order to evaluate a sample. File template.arff is the template which contains the meta-data of a arff file and models/smo my model.

 public static void Evaluate(ArrayList<Float> temp) throws Exception {

    temp.add(Float.parseFloat("1"));
    System.out.println(temp.size());
    double dt[] = new double[temp.size()];
    for (int index = 0; index < temp.size(); index++) {
        dt[index] = temp.get(index);
    }

    double data[][] = new double[1][];
    data[0] = dt;
    weka.classifiers.Classifier c = loadModel(new File("models/"), "/smo"); // loads smo model

    File tmp = new File("template.arff"); //loads data template
    Instances dataset = new weka.core.converters.ConverterUtils.DataSource(tmp.getAbsolutePath()).getDataSet();
    int numInstances = data.length;

    for (int inst = 0; inst < numInstances; inst++) {
        dataset.add(new Instance(1.0, data[inst]));
    }
    dataset.setClassIndex(dataset.numAttributes() - 1);
    Evaluation eval = new Evaluation(dataset);
    //returned evaluated index
    double a = eval.evaluateModelOnceAndRecordPrediction(c, dataset.instance(0));
    double arr[] = c.distributionForInstance(dataset.instance(0));


    System.out.println(" Confidence Scores");
    for (int idx = 0; idx < arr.length; idx++) {
        System.out.print(arr[idx] + " ");
    }
    System.out.println();
}

I am not sure if I am right here. I create the sample file. Afterwards I am loading my model. I am wandering if my code is what I need in order to evaluate the class of sample temp. If this code is ok, how can I extract the confidence score and not the binary decision about the class? The structure of template.arff file is:

@relation Dataset
@attribute Attribute0 numeric
@attribute Attribute1 numeric
@attribute Attribute2 numeric
...
@ATTRIBUTE class {1, 2}

@data

Moreover loadModel function is the following:

public static SMO loadModel(File path, String name) throws Exception {

    SMO classifier;

    FileInputStream fis = new FileInputStream(path + name + ".model");
    ObjectInputStream ois = new ObjectInputStream(fis);

    classifier = (SMO) ois.readObject();
    ois.close();

    return classifier;
}

I found this post here which suggest to locate the SMO.java file and change the following line smo.buildClassifier(train, cl1, cl2, true, -1, -1); // from false to true. However it seems when I did so, I got the same binary output.

My training function:

   public void weka_train(File input, String[] options) throws Exception {   

     long start = System.nanoTime();
     File tmp = new File("data.arff");
     TwitterTrendSetters obj = new TwitterTrendSetters();
     Instances data = new weka.core.converters.ConverterUtils.DataSource(
            tmp.getAbsolutePath()).getDataSet();
     data.setClassIndex(data.numAttributes() - 1);
     Classifier c = null;
     String ctype = null;
     boolean newmodel = false;

     ctype = "SMO";
     c = new SMO();

     for (int i = 0; i < options.length; i++) {
        System.out.print(options[i]);

     }

     c.setOptions(options);
     c.buildClassifier(data);
     newmodel = true;

     if (newmodel) {
        obj.saveModel(c, ctype, new File("models"));
     }
    }

Upvotes: 3

Views: 1561

Answers (2)

christosh
christosh

Reputation: 193

Basically you should try to use the option "-M" for SMO to fit logistic models, in training process. Check the solution proposed here. It should work!

Upvotes: 0

applecrusher
applecrusher

Reputation: 5648

I have some suggestions but I have no idea whether they will work. Let me know if this works for you.

First use SMO not just the parent object Classifier class. I created a new method loadModelSMO as an example of this.

SMO Class

public static SMO loadModelSMO(File path, String name) throws Exception {

   SMO classifier;

   FileInputStream fis = new FileInputStream(path + name + ".model");
   ObjectInputStream ois = new ObjectInputStream(fis);

   classifier = (SMO) ois.readObject();
   ois.close();

   return classifier;
}

and then

SMO c = loadModelSMO(new File("models/"), "/smo");
...

I found a article that might help you out from the mailing list subject titled I used SMO with logistic regression but I always get a confidence of 1.0

It suggest to set use the -M to fit your logistics model which can be used through the method

setOptions(java.lang.String[] options)

Also maybe you need to set your build logistics model to true Confidence score in SMO

c.setBuildLogisticModels(true); 

Let me know if this helped at all.

Upvotes: 3

Related Questions