Alessandro
Alessandro

Reputation: 794

Confidence estimation in SVM (one-vs-all) for multiclass-classification

When using SVM-OVR (Ove-Vs-Rest) for multiclass-classification, n classifiers are trained, with n equals to the number of classes. The i-th classifier basically computes a binary classification between the class i and the class containing all the others.

Then, in order to predict a new data sample, all the n classifiers are tested, and based on a confidence estimated for each of the classifiers, the most probable class is returned. For example, class1 = 0.59, class2 = 0.61, and so on, and the one with the associated largest probability will correspond to the output class.

I'm wondering how exactly the confidence is computed for each classifier. I have tried to read the documentation in SVC, but I can't see how the predict function evaluates each classifier. In other words, if class1 = 0.59, how 0.59 is calculated? Which is the raw value from which it is generated? Is the euclidean distance of the sample from the hyperplane?

Upvotes: 2

Views: 686

Answers (1)

Yahya
Yahya

Reputation: 14102

That is achieved by Platt Scaling (also known as Platt Calibration).

Platt Scaling is an algorithm that transforms the outputs of these multiple classifers into a probability distribution over classes.

It is given by:

enter image description here

Where f(x) is the SVM output, and A and B are just scalars learnt by the algorithm.

Of course, Scikit-learn might use a variant of this, but this, however, is the main idea.

For more details, I refer you to the original paper.


Update

Based on your comment below, the f(x) is simply the classifier score, that is the decision function output, and for SVC it is: f(x) = θᵀg(x) + b (the weights multiplied by some mapping function + some bias), in which the outputs of f(x) from classifiers are obtained.

Now those scores from each classifier, are plugged into the Platt Scaling algorithm to turn them into probabilities, based on the formula given above.

Please note that Platt Scaling is performed via cross-validation to avoid overfitting that might occur because of the choice of parameters A and B in the Platt Scaling formula above (thus it might be more computationally expensive). Also note that Scikit-learn uses libsvm which is written in C, for this purpose.

Upvotes: 2

Related Questions