Reputation: 249
I am following the example on this page : Example of 10-fold SVM classification in MATLAB.
Basically, i am following the example to execute my classification. The problem i face is that pred is always postive. It is not able to detect negative data.
clear all;
clc;
load('C:\Users\HP\Documents\MATLAB\TrainLabel');
load('C:\Users\HP\Documents\MATLAB\TrainVec');
cvFolds = crossvalind('Kfold', TrainLabel, 10);
cp = classperf(TrainLabel);
for i = 1:10
testIdx = (cvFolds == i);
trainIdx = ~testIdx;
% Model = svmtrain(TrainVec(trainIdx,:), TrainLabel(trainIdx),'showplot',true);
Model = svmtrain(TrainVec(trainIdx,:), TrainLabel(trainIdx), ...
'Autoscale',true, 'Showplot',false, 'Method','QP', ...
'BoxConstraint',2e-1, 'Kernel_Function','rbf', 'RBF_Sigma',1);
pred = svmclassify(Model, TrainVec(testIdx,:),'Showplot',false);
cp = classperf(cp, pred, testIdx);
end
cp.CorrectRate
cp.CountingMatrix
The values for pred is [1;1;1;1;1;1] but my correctrate is 0.65(65%) and the TrainLabel is <60x1 double> and TrainVec is <60x5900 double>.
Two more qns:
must the values of TrainLabel be 0 and 1? is it ok if it is -1 or 1
TrainVec is such that each feature from a image is placed in a row. The feature from the next image is placed in the next row. Is this correct? or must each of features be placed in a different column?
Need some help on this... thanks
Upvotes: 1
Views: 490
Reputation: 9549
You just have to many features.
You are trying to find a separating 5899-dimensional hyperplane using only 60 training points. That simply is not going to work because of the Curse of dimensionality (aka. the Hughes effect).
You need to extract relevant features first, and work on only those. This is called Feature Extraction.
One easy way of doing this is using pcacov
to transform your data using Principle Component Analysis, then keep only a certain fraction (use the third, EXPLAINED
result to keep the k PC's explaining a certrain level of variance, like 98%). That'll cut short the dimensionality of your problem, and very likely improve your results.
Do remember to transform all your data, not just the training set.
Aside of that, your approach seems correct to me. Different samples go in different rows, with their features spanning the columns.
The label vector can be whatever you'd want:
Y is a grouping variable, i.e., it can be a categorical, numeric, or logical vector; a cell vector of strings; or a character matrix with each row representing a class label (see help for groupingvariable)
Upvotes: 4