Aquarius24
Aquarius24

Reputation: 1866

Evaluating the model in WEKA

I have applied classification algorithm on dataset and came out with below stats:

Correctly Classified Instances         684               76.1693 %
Incorrectly Classified Instances       214               23.8307 %
Kappa statistic                          0     
Mean absolute error                      0.1343
Root mean squared error                  0.2582
Relative absolute error                100      %
Root relative squared error            100      %
Total Number of Instances              898     

=== Detailed Accuracy By Class ===

               TP Rate   FP Rate   Precision   Recall  F-Measure   ROC Area  Class
                 0         0          0         0         0          0.5      1
                 0         0          0         0         0          0.5      2
                 1         1          0.762     1         0.865      0.5      3
                 0         0          0         0         0          ?        4
                 0         0          0         0         0          0.5      5
                 0         0          0         0         0          0.5      U
Weighted Avg.    0.762     0.762      0.58      0.762     0.659      0.5  

=== Confusion Matrix ===

   a   b   c   d   e   f   <-- classified as
   0   0   8   0   0   0 |   a = 1
   0   0  99   0   0   0 |   b = 2
   0   0 684   0   0   0 |   c = 3
   0   0   0   0   0   0 |   d = 4
   0   0  67   0   0   0 |   e = 5
   0   0  40   0   0   0 |   f = U

I can understand much of the data however there is a problem interpreting the values since i am new to Weka: 1. Which error rate to report overall? 2. How to interpret if something interesting about the model?

Upvotes: 1

Views: 1271

Answers (2)

Rebecca Morgan
Rebecca Morgan

Reputation: 41

The ROC area is also useful in terms of evaluating accuracy and interpreting how interesting a model is. Simply speaking, the true positive rate is plotted against the false positive rate and the ROC area is calculated as the area underneath this curve. A high ROC area, say 0.9 to 1, indicates that the model is very good at classifying instances, whereas a ROC area of 0.5 (as in your model) means that the model is no better at classification than a random method like flipping coins.

Upvotes: 2

dedek
dedek

Reputation: 8301

1) Overall error measure

The triplet Precision, Recall and F-Measure together is reported quite often because each number represents a different aspect of the model.

If would like to have a single number only then take Percent (In)correctly Classified Instances or Weighted Avg. F-Measure.

The other error measures are also useful but they require deeper knowledge of statistics (which I'm lacking :-)

2) Something interesting about the model

From Detailed Accuracy By Class and Confusion Matrix you can see that the model is quite simple. It classifies everything as class 3. The error measures looks quite successful, but it is just because 76% of instances in the dataset have the class 3. The model corresponds with often used baseline algorithm called "most common class".

Upvotes: 3

Related Questions