Wierd behavoir while training an SVM classifier

Question

I am searching for the best value of C (Cost parameter) for training my SVM classifier. Here is my code:

clear all; close all; clc

% Load training features and labels
[y, x] = libsvmread('training_data.train'); %the training dataset is named training_data.train


cost=[2^-7,2^-5,2^-3,2^-1,2^1,2^3,2^5,2^7,2^9,2^11,2^13,2^15];
accuracy=zeros(1,length(cost)); %This array will store the accuracy values corresponding to each element in the cost array

for i = 1:length(cost)
  opt = sprintf('-c %i -v 3',cost(i));
  accuracy(i)=svmtrain(y,x,opt);
end

accuracy

I am using the LIBSVM library. When I run this program, the accuracy array is populated with pretty weird values: Here is the output:

Columns 1 through 8:

67.335 93.696 91.404 92.550 93.696 93.553 93.553 93.553

Columns 9 through 12:

93.553 93.553 93.553 93.553

This means that I get the highest cross-validation accuracy on 2^-5. Should I get the highest accuracy on the highest value of C? (As much as I understand, it is a penalty factor for misclassification). Is this behavior expected of it? (I am building a classifier for breast cancer identification using the UCI ML database).

lejlot · Accepted Answer

Should I get the highest accuracy on the highest value of C? (As much as I understand, it is a penalty factor for misclassification).

No, there is no guarantee, as the SVM cost is not accuracy-based, it uses a specific surrogate function which only roughly behaves like accuracy, but you can expect many random fluctuations. In general, you should expect high values for high C, but not necessarily the highest one in general.

Is this behavior expected of it? (I am building a classifier for breast cancer identification using the UCI ML database).

Yes, it is a possible outcome.

Wierd behavoir while training an SVM classifier

Answers (1)

Related Questions