Reputation: 1015
I have dataset with 4000 features and 35 samples. All the features are floating point numbers between 1 and 3. eg: 2.68244527684596.
I'm struggling to get any classifier working on this data. I have used knn, svm (with linear,rbf,poly). Then I have learnt about normalization. Still, it's a bit complex for me and I cannot get this code working and giving me proper prediction.
The code I'm using to normalize data is:
train_data = preprocessing.scale(train_data)
train_data = preprocessing.normalize(train_data,'l1',0)
The code I'm trying to classify with is:
# SVM with poly
svc1 = svm.SVC(kernel='poly',degree=3)
svc1.fit(train_data[:-5], train_labels[:-5])
print "Poly SVM: ",svc1.predict(train_data[-5:])
# SVM with rbf
svc2 = svm.SVC(kernel='rbf')
svc2.fit(train_data[:-5], train_labels[:-5])
print "RBF SVM: ",svc2.predict(train_data[-5:])
#SVM with linear
svc3 = svm.SVC(kernel='linear')
svc3.fit(train_data[:-5], train_labels[:-5])
print "Linear SVM: ",svc3.predict(train_data[-5:])
# KNN
knn = KNeighborsClassifier()
knn.fit(train_data[:-5], train_labels[:-5])
print "KNN :", knn.predict(train_data[-5:])
# Linear regression
logistic = linear_model.LogisticRegression()
print('LogisticRegression score: %f' % logistic.fit(train_data[5:], train_labels[5:]).score(train_data[0:4], train_labels[0:4]))
I'm a newbie to machine learning and I'm working hard to learn more about all the concepts. I thought someone might point me in the right direction.
Note: I have only 35 samples and this is part of an assignment. I cannot get more data :(
Upvotes: 0
Views: 1477
Reputation: 66805
If your data is not specific in any sense, then the standarization preprocessing.scale
should be just fine. It forces each dimension to have 0-mean and standard deviation 1, so more or less it tries to enclose data in a 0-centered ball. It is worth noting that you should not use normalize
, normalize forces each sample to have a unit norm, it has to be justified by your data (as you force your points to be placed on the sphere then). It is rarely the case.
There might be dozens of reasons why your classifiers do not work. In particular - is it your testing code? If so:
Upvotes: 2