How to calculate classification error rate

Alright. Now this question is pretty hard. I am going to give you an example.

Now the left numbers are my algorithm classification and the right numbers are the original class numbers

So here my algorithm merged 2 different classes into 1. As you can see it merged class 86 and 89 into one class. So what would be the error at the above example ?

Or here another example

At the above example left numbers are my algorithm classification and the right numbers are original class ids. As can be seen above it miss classified 3 products (i am classifying same commercial products). So at this example what would be the error rate? How would you calculate.

This question is pretty hard and complex. We have finished the classification but we are not able to find correct algorithm for calculating success rate :D

Upvotes: 4

Answers (4)

Sibelius Seraphini

Reputation: 5613

Classification Error Rate(CER) is 1 - Purity (http://nlp.stanford.edu/IR-book/html/htmledition/evaluation-of-clustering-1.html)

ClusterPurity <- function(clusters, classes) {
    sum(apply(table(classes, clusters), 2, max)) / length(clusters)
}

Code of @john-colby Or

CER <- function(clusters, classes) {
    1- sum(apply(table(classes, clusters), 2, max)) / length(clusters)
}

Upvotes: -2

denis

Reputation: 21947

Here's a longish example, a real confuson matrix with 10 input classes "0" - "9" (handwritten digits), and 10 output clusters labelled A - J.

Confusion matrix for 5620 optdigits:

True 0 - 9 down, clusters A - J across
-----------------------------------------------------
      A    B    C    D    E    F    G    H    I    J
-----------------------------------------------------
0:    2         4         1       546    1
1:   71  249        11    1    6            228    5
2:   13    5        64    1   13    1       460
3:   29    2       507        20         5    9
4:        33  483         4   38         5    3    2
5:    1    1    2   58    3            480   13
6:    2    1    2       294         1         1  257
7:    1    5    1            546         6    7
8:  415   15    2    5    3   12        13   87    2
9:   46   72    2  357        35    1   47    2
----------------------------------------------------
    580  383  496 1002  307  670  549  557  810  266  estimates in each cluster

y class sizes: [554 571 557 572 568 558 558 566 554 562]
kmeans cluster sizes: [ 580  383  496 1002  307  670  549  557  810  266]

For example, cluster A has 580 data points, 415 of which are "8"s; cluster B has 383 data points, 249 of which are "1"s; and so on.

The problem is that the output classes are scrambled, permuted; they correspond in this order, with counts:

      A    B    C    D    E    F    G    H    I    J
      8    1    4    3    6    7    0    5    2    6
    415  249  483  507  294  546  546  480  460  257

One could say that the "success rate" is 75 % = (415 + 249 + 483 + 507 + 294 + 546 + 546 + 480 + 460 + 257) / 5620
but this throws away useful information — here, that E and J both say "6", and no cluster says "9".

So, add up the biggest numbers in each column of the confusion matrix and divide by the total.
But, how to count overlapping / missing clusters, like the 2 "6"s, no "9"s here ?
I don't know of a commonly agreed-upon way (doubt that the Hungarian algorithm is used in practice).

Bottom line: don't throw away information; look at the whole confusion matrix.

NB such a "success rate" will be optimistic for new data !
It's customary to split the data into say 2/3 "training set" and 1/3 "test set", train e.g. k-means on the 2/3 alone,
then measure confusion / success rate on the test set — generally worse than on the training set alone.
Much more can be said; see e.g. Cross-validation.

Upvotes: 5

unsym

Reputation: 2200

You have to define a error metric to measure yourself. In your case, a simple method should be to find the properties mapping of your product as

p = properties(id)

where id is the product id, and p is likely be a vector with each entry of different properties. Then you can define the error function e (or distance) between two products as

e = d(p1, p2)

Sure, each properties must be evaluated to a number in this function. Then this error function can be used in the classification algorithm and learning.

In your second example, it seems that you treat the pair (203 7) as successful classification, so I think you have already a metric yourself. You may be more specific to get better answer.

Upvotes: 0

dfb

Reputation: 13289

You have to define the error criteria if you want to evaluate the performance of an algorithm, so I'm not sure exactly what you're asking. In some clustering and machine learning algorithms you define the error metric and it minimizes it.

Take a look at this https://en.wikipedia.org/wiki/Confusion_matrix to get some ideas

Upvotes: 0

How to calculate classification error rate

Answers (4)

Related Questions