user999450
user999450

Reputation: 249

Not able to classify the features correctly

I am following the example on this page : Example of 10-fold SVM classification in MATLAB.

Basically, i am following the example to execute my classification. The problem i face is that pred is always postive. It is not able to detect negative data.

clear all;
clc;
load('C:\Users\HP\Documents\MATLAB\TrainLabel');
load('C:\Users\HP\Documents\MATLAB\TrainVec');
cvFolds = crossvalind('Kfold', TrainLabel, 10);  
cp = classperf(TrainLabel);   
for i = 1:10                                   
    testIdx = (cvFolds == i);                   
    trainIdx = ~testIdx;                             
%     Model = svmtrain(TrainVec(trainIdx,:), TrainLabel(trainIdx),'showplot',true); 
    Model = svmtrain(TrainVec(trainIdx,:), TrainLabel(trainIdx), ...              
     'Autoscale',true, 'Showplot',false, 'Method','QP', ...              
     'BoxConstraint',2e-1, 'Kernel_Function','rbf', 'RBF_Sigma',1);
    pred = svmclassify(Model, TrainVec(testIdx,:),'Showplot',false);      
    cp = classperf(cp, pred, testIdx);
end 
cp.CorrectRate 
cp.CountingMatrix 

The values for pred is [1;1;1;1;1;1] but my correctrate is 0.65(65%) and the TrainLabel is <60x1 double> and TrainVec is <60x5900 double>.

Two more qns:

  1. must the values of TrainLabel be 0 and 1? is it ok if it is -1 or 1

  2. TrainVec is such that each feature from a image is placed in a row. The feature from the next image is placed in the next row. Is this correct? or must each of features be placed in a different column?

Need some help on this... thanks

Upvotes: 1

Views: 490

Answers (2)

user999450
user999450

Reputation: 249

Scale the values to between 0 to 1.This will solve the problem

Upvotes: -1

jpjacobs
jpjacobs

Reputation: 9549

You just have to many features.

You are trying to find a separating 5899-dimensional hyperplane using only 60 training points. That simply is not going to work because of the Curse of dimensionality (aka. the Hughes effect).

You need to extract relevant features first, and work on only those. This is called Feature Extraction.

One easy way of doing this is using pcacov to transform your data using Principle Component Analysis, then keep only a certain fraction (use the third, EXPLAINED result to keep the k PC's explaining a certrain level of variance, like 98%). That'll cut short the dimensionality of your problem, and very likely improve your results.

Do remember to transform all your data, not just the training set.

Aside of that, your approach seems correct to me. Different samples go in different rows, with their features spanning the columns.

The label vector can be whatever you'd want:

Y is a grouping variable, i.e., it can be a categorical, numeric, or logical vector; a cell vector of strings; or a character matrix with each row representing a class label (see help for groupingvariable)

Upvotes: 4

Related Questions