Reputation: 145
I used 1~200 data as trainning data, 201~220 as testing data format likes: 3 class(class 1,class 2, class 3) and 20 features
2 1:100 2:96 3:88 4:94 5:96 6:94 7:72 8:68 9:69 10:70 11:76 12:70 13:73 14:71 15:74 16:76 17:78 18:81 19:76 20:76
2 1:96 2:100 3:88 4:88 5:90 6:98 7:71 8:66 9:63 10:74 11:75 12:66 13:71 14:68 15:74 16:78 17:78 18:85 19:77 20:76
2 1:88 2:88 3:100 4:96 5:91 6:89 7:70 8:70 9:68 10:74 11:76 12:71 13:73 14:74 15:79 16:77 17:73 18:80 19:78 20:78
2 1:94 2:87 3:96 4:100 5:92 6:88 7:76 8:73 9:71 10:70 11:74 12:67 13:71 14:71 15:76 16:77 17:71 18:80 19:73 20:73
2 1:96 2:90 3:91 4:93 5:100 6:92 7:74 8:67 9:67 10:75 11:75 12:67 13:74 14:73 15:77 16:77 17:75 18:82 19:76 20:74
2 1:93 2:98 3:90 4:88 5:92 6:100 7:73 8:66 9:65 10:73 11:78 12:69 13:73 14:72 15:75 16:74 17:75 18:83 19:79 20:77
3 1:73 2:71 3:73 4:76 5:74 6:73 7:100 8:79 9:79 10:71 11:65 12:58 13:67 14:73 15:74 16:72 17:60 18:63 19:64 20:60
3 1:68 2:66 3:70 4:73 5:68 6:67 7:78 8:100 9:85 10:77 11:57 12:57 13:58 14:62 15:68 16:64 17:59 18:57 19:57 20:59
3 1:69 2:64 3:70 4:72 5:69 6:65 7:78 8:85 9:100 10:70 11:56 12:63 13:62 14:61 15:64 16:69 17:56 18:55 19:55 20:51
3 1:71 2:74 3:74 4:70 5:76 6:73 7:71 8:73 9:71 10:100 11:58 12:58 13:59 14:60 15:58 16:65 17:57 18:57 19:63 20:57
1 1:77 2:75 3:76 4:73 5:75 6:79 7:66 8:56 9:56 10:59 11:100 12:77 13:84 14:79 15:82 16:80 17:82 18:82 19:81 20:82
1 1:70 2:66 3:71 4:67 5:67 6:70 7:63 8:57 9:62 10:58 11:77 12:100 13:84 14:75 15:76 16:78 17:73 18:72 19:87 20:80
1 1:73 2:72 3:73 4:71 5:74 6:74 7:68 8:58 9:61 10:59 11:84 12:84 13:100 14:86 15:88 16:91 17:81 18:81 19:84 20:86
1 1:71 2:69 3:75 4:71 5:73 6:73 7:74 8:61 9:61 10:60 11:79 12:75 13:86 14:100 15:90 16:88 17:74 18:79 19:81 20:82
1 1:74 2:74 3:80 4:76 5:78 6:76 7:73 8:66 9:64 10:59 11:81 12:76 13:88 14:90 15:100 16:93 17:74 18:83 19:81 20:85
1 1:76 2:77 3:77 4:76 5:78 6:75 7:73 8:64 9:68 10:65 11:80 12:78 13:91 14:88 15:93 16:100 17:79 18:79 19:82 20:83
1 1:78 2:78 3:73 4:71 5:75 6:75 7:61 8:58 9:57 10:56 11:82 12:73 13:81 14:74 15:74 16:80 17:100 18:85 19:80 20:85
1 1:81 2:85 3:79 4:80 5:82 6:82 7:63 8:56 9:55 10:57 11:82 12:72 13:81 14:79 15:83 16:79 17:85 18:100 19:83 20:79
1 1:76 2:77 3:78 4:75 5:76 6:79 7:65 8:57 9:57 10:63 11:81 12:87 13:84 14:81 15:81 16:82 17:80 18:83 19:100 20:87
1 1:76 2:76 3:78 4:73 5:75 6:78 7:60 8:59 9:51 10:57 11:82 12:80 13:86 14:82 15:85 16:83 17:85 18:79 19:87 20:100
Then, I write code to classify them:
% read the data set
[image_label, image_features] = libsvmread(fullfile('D:\...'));
[N D] = size(image_features);
% Determine the train and test index
trainIndex = zeros(N,1);
trainIndex(1:200) = 1;
testIndex = zeros(N,1);
testIndex(201:N) = 1;
trainData = image_features(trainIndex==1,:);
trainLabel = image_label(trainIndex==1,:);
testData = image_features(testIndex==1,:);
testLabel = image_label(testIndex==1,:);
% Train the SVM
model = svmtrain(trainLabel, trainData, '-c 1 -g 0.05 -b 1');
% Use the SVM model to classify the data
[predict_label, accuracy, prob_values] = svmpredict(testLabel, testData, model, '-b 1');
But the final result for predict_label are all class 1, so the accuracy is 50%, which that it cannot get the correct predict label for class 2 and 3. Is there something wrong from the format of data, or the code that I implemented? Please help me, thanks very much.
Upvotes: 0
Views: 580
Reputation: 66775
To elaborate a bit more about the problem, there are at least three problems here:
You just check one values of parameters C
(c) and Gamma
(g) - behaviour of SVM is heavily dependant on the good choice of these parameters, so it is a common approach to use a grid search using cross validation testing for selecting the best ones.
Data scale also plays an important role here, if some of the dimensions are much bigger then the rest, you will bias the whole classifier, in order to deal with it there are at least two basic approaches: 1. Scale linearly each dimension to some interval (like [0,1] or [-1,1]) or normalize the data by transformation through Sigma^(-1/2) where Sigma is a data covariance matrix
Label imbalance - SVM works best when you have exactly the same amount of points in each class. Once it is not true, you should use the class weighting scheme in order to get valid results.
After fixing these three issues you should get reasonable results.
Upvotes: 2
Reputation: 9549
My guess is that you'd want to tune your parameters.
Make a loop over your -c
and -g
values (typically logarithimically, eg -c
10^(-3:5)
) and pick the one that is best.
That said, it is advisable to normalize your data, eg. scale it such that all values are between 0 and 1.
Upvotes: 2