Reputation: 4642
I am searching for the best value of C (Cost parameter) for training my SVM classifier. Here is my code:
clear all; close all; clc
% Load training features and labels
[y, x] = libsvmread('training_data.train'); %the training dataset is named training_data.train
cost=[2^-7,2^-5,2^-3,2^-1,2^1,2^3,2^5,2^7,2^9,2^11,2^13,2^15];
accuracy=zeros(1,length(cost)); %This array will store the accuracy values corresponding to each element in the cost array
for i = 1:length(cost)
opt = sprintf('-c %i -v 3',cost(i));
accuracy(i)=svmtrain(y,x,opt);
end
accuracy
I am using the LIBSVM library. When I run this program, the accuracy array is populated with pretty weird values: Here is the output:
Columns 1 through 8:
67.335 93.696 91.404 92.550 93.696 93.553 93.553 93.553
Columns 9 through 12:
93.553 93.553 93.553 93.553
This means that I get the highest cross-validation accuracy on 2^-5. Should I get the highest accuracy on the highest value of C? (As much as I understand, it is a penalty factor for misclassification). Is this behavior expected of it? (I am building a classifier for breast cancer identification using the UCI ML database).
Upvotes: 0
Views: 43
Reputation: 66805
Should I get the highest accuracy on the highest value of C? (As much as I understand, it is a penalty factor for misclassification).
No, there is no guarantee, as the SVM cost is not accuracy-based, it uses a specific surrogate function which only roughly behaves like accuracy, but you can expect many random fluctuations. In general, you should expect high values for high C, but not necessarily the highest one in general.
Is this behavior expected of it? (I am building a classifier for breast cancer identification using the UCI ML database).
Yes, it is a possible outcome.
Upvotes: 1