Reputation: 163
Problem: I need to train a classifier (in matlab) to classify multiple levels of signal noise.
So i trained a multi class SVM in matlab using the fitcecoc and obtained an accuracy of 92%.
Then i trained a multiclass SVM using sklearn.svm.svc in python, but it seems that however i fiddle with the parameters, i cannot achieve more than 69% accuracy.
30% of the data was held back and used to verify the training. the confusion matrixes can be seen below.
So if anyone has some experience or suggestions with svm.svc multiclass training and can see a problem in my code, or has a suggestion it would be greatly appreciated.
Python code:
import numpy as np
from sklearn import svm
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import train_test_split
#from sklearn import preprocessing
#### SET fitting parameters here
C = 100
gamma = 1e-8
#### SET WEIGHTS HERE
C0_Weight = 1*C
C1_weight = 1*C
C2_weight = 1*C
C3_weight = 1*C
C4_weight = 1*C
#####
X = np.genfromtxt('data/features.csv', delimiter=',')
Y = np.genfromtxt('data/targets.csv', delimiter=',')
print 'feature data is of size: ' + str(X.shape)
print 'target data is of size: ' + str(Y.shape)
# SPLIT X AND Y INTO TRAINING AND TEST SET
test_size = 0.3
X_train, x_test, Y_train, y_test = train_test_split(X, Y,
... test_size=test_size, random_state=0)
svc = svm.SVC(C=C,kernel='rbf', gamma=gamma, class_weight = {0:C0_Weight,
... 1:C1_weight, 2:C2_weight, 3:C3_weight, 4:C4_weight},cache_size = 1000)
svc.fit(X_train, Y_train)
scores = cross_val_score(svc, X_train, Y_train, cv=10)
print scores
print("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2))
Out = svc.predict(x_test)
np.savetxt("data/testPredictions.csv", Out, delimiter=",")
np.savetxt("data/testTargets.csv", y_test, delimiter=",")
# calculate accuracy in test data
Hits = 0
HitsOverlap = 0
for idx, val in enumerate(Out):
Hits += int(y_test[idx]==Out[idx])
HitsOverlap += int(y_test[idx]==Out[idx]) + int(y_test[idx]==
... (Out[idx]-1)) + int(y_test[idx]==(Out[idx]+1))
print "Accuracy in testset: ", Hits*100/(11595*test_size)
print "Accuracy in testset w. overlap: ", HitsOverlap*100/(11595*test_size)
to those curious how i got the parameters, they were found with GridSearchCV (and increased the accuracy from 40% to 69)
Any help or suggestions is greatly appreciated.
Upvotes: 1
Views: 2297
Reputation: 163
After much pulling my hair, the answer was found here: http://neerajkumar.org/writings/svm/
when the inputs were scaled with StandardScaler(), svm.svc now produces superior results to matlab!!
Upvotes: 1