Vinayak Agarwal
Vinayak Agarwal

Reputation: 1410

Interpretation of Probability Estimate for Multi-class classification in LibSVM for MATLAB

Problem: 3 class classification with labels 1,2,3.

Tool: LibSVM for MATLAB

svmModel = svmtrain(<Trainfeatures>, <TrainclassLabels>, '-b 1 -c <someCValue> -g <someGammaValue>');
[predLabels, classAccuracy, **probEstimates**] = svmpredict(<TestFeatures>, <TestClassLabels>, '-b 1');

AFter this step, I get the first ten rows of probEstimates to be,

0.9129    0.0749    0.0122
0.9059    0.0552    0.0389
0.8231    0.0183    0.1586
0.9077    0.0098    0.0825
0.9074    0.0668    0.0257
0.8685    0.0146    0.1169
0.8962    0.0664    0.0374
0.9074    0.0548    0.0377
0.9474    0.0054    0.0472
0.9178    0.0642    0.0180

but the first ten predicted labels to be:

 2
 2
 2
 2
 2
 2
 2
 2
 2
 2

Questions:

  1. My understanding was that the probability estimate was the probability that a particular item would belong to a particular class, given its feature vector. However, if that were true, then these items should belong to class 1 and not class 2. Does the libsvm change the order of classes or am I missing something here? If I am wrong, can someone please explain what the real interpretation of probability estimate is?

  2. If I have to move the decision boundary to increase the precision of class 1 (have less items to be predicted to be class 1 and hence be more conservative in the decision boundary), which of these class probabilities should I have to deal with and how?

Upvotes: 4

Views: 2464

Answers (2)

Phan
Phan

Reputation: 1

The order of the the labels stored in the model may different from what we thought it should be. You can check using svmModel.Label. And the probability estimates are outputted according to this order.

Upvotes: 0

feirainy
feirainy

Reputation: 507

I came across the same problem recently. The reason is related to the order of training data. If you want the index of post-probability vector to correspond to the label of training data, the training data should be sorted according to the label.

For example, if the label of the the first data point is 4, then the first entry of post-probability vector is related to data points labeled 4.

Upvotes: 7

Related Questions