Nidale
Nidale

Reputation: 69

weka confusion matrix and accuracy analysis

How do I analyze the confusion matrix in Weka with regards to the accuracy obtained? We know that accuracy is not accurate because of imbalanced data sets. How does the confusion matrix "confirm" the accuracy?

Examples: a) accuracy 96.1728 %

   a   b   c   d   e   f   g   <-- classified as
 124   0   0   0   1   0   0 |   a = brickface
   0 110   0   0   0   0   0 |   b = sky
   1   0 119   0   2   0   0 |   c = foliage
   1   0   0 107   2   0   0 |   d = cement
   1   0  12   7 105   0   1 |   e = window
   0   0   0   0   0  94   0 |   f = path
   0   0   1   0   0   2 120 |   g = grass

b) accuracy : 96.8 %

a   b   c   d   e   f   g   <-- classified as
 202   0   0   0   3   0   0 |   a = brickface
   0 220   0   0   0   0   0 |   b = sky
   0   0 198   0  10   0   0 |   c = foliage
   0   0   1 202  16   1   0 |   d = cement
   2   0  11   2 189   0   0 |   e = window
   0   0   0   2   0 234   0 |   f = path
   0   0   0   0   0   0 207 |   g = grass

etc...

Upvotes: 2

Views: 2983

Answers (3)

Ashok Kumar Jayaraman
Ashok Kumar Jayaraman

Reputation: 3085

Accuracy is the proportion of the total number of correct predictions. It is calculated as

Accuracy = (124+110+119+107+105+94+120)/(124+0+0+0+1+0+0+0+110+0+0+0+0+0+1+0+119+0+2+0+0+1+0+0+107+2+0+0+1+0+12+7+105+0+1+0+0+0+0+0+94+0+0+0+1+0+0+2+120)
Accuracy = 779/810 = 0.961728

Similarly,

Accuracy = (202+220+198+202+189+234+207)/(202+0+0+0+3+0+0+0+220+0+0+0+0+0+0+0+198+0+10+0+0+0+0+1+202+16+1+0+2+0+11+2+189+0+0+0+0+0+2+0+234+0+0+0+0+0+0+0+207)
Accuracy = 1452/1500 = 0.968

Upvotes: 0

upepo
upepo

Reputation: 41

   a   b   c   d   e   f   g   <-- classified as
 124   0   0   0   1   0   0 |   a = brickface
...

It means there is 125 examples a(brickface). and 124 examples are classified as a (correct) and 1 example classified as e(incorrect).

If you think your data is imbalanced, use AUC score. It's stub for unbalanced data set.

Upvotes: 2

Jose Maria Gomez Hidalgo
Jose Maria Gomez Hidalgo

Reputation: 1061

The accuracy is computed by summing up all instances in the main diagonal and dividing by the total number of instances (the contents of all the confusion matrix). For instance, in a), you get 124 + 110 + ... + 120 = 779, and the total number of instances (summing everything) is 810, so the accuracy is 0,9617 => 96,17%.

Your datasets are rather balanced (all the classes have approximately the same number of instances). You can see that the dataset is imbalanced when the sum of a row is much bigger than the sume of other rows, as rows represent actual classes. For instance:

a   b  <-- classified as
1000 20 | a = class1
10 10   | b = class2

In this case, class1 has 1020 instances, and class2 has only 20, so the problem is highly imbalanced. This will impact in classifier perfomance, as learning algorithm typically try to maximize the accuracy (or minimize the error), so a trivial classifier like e.g. the rule for any X, set class = class1 will have an accuracy of 1020/1040 = 0,9807.

Upvotes: 3

Related Questions