Prateek Narendra
Prateek Narendra

Reputation: 1937

Error in using Non Linear SVM in Scikit-Learn

I have a code to try to use Non Linear SVM (RBF kernel).

raw_data1 = open("/Users/prateek/Desktop/Programs/ML/Dataset.csv")
raw_data2 = open("/Users/prateek/Desktop/Programs/ML/Result.csv")

dataset1 = np.loadtxt(raw_data1,delimiter=",")
result1 = np.loadtxt(raw_data2,delimiter=",")

clf = svm.NuSVC(kernel='rbf')
clf.fit(dataset1,result1)

However, when I try to fit, I get the error

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/prateek/Desktop/Programs/ML/lib/python2.7/site-packages/sklearn/svm/base.py", line 193, in fit
    fit(X, y, sample_weight, solver_type, kernel, random_seed=seed)
  File "/Users/prateek/Desktop/Programs/ML/lib/python2.7/site-packages/sklearn/svm/base.py", line 251, in _dense_fit
    max_iter=self.max_iter, random_seed=random_seed)
  File "sklearn/svm/libsvm.pyx", line 187, in sklearn.svm.libsvm.fit (sklearn/svm/libsvm.c:2098)
ValueError: specified nu is infeasible

Link for Results.csv

Link for dataset

What is the reason for such an error?

Upvotes: 3

Views: 4458

Answers (1)

Guiem Bosch
Guiem Bosch

Reputation: 2758

The nu parameter is, as pointed out in the documentation, "An upper bound on the fraction of training errors and a lower bound of the fraction of support vectors".

So, whenever you try to fit your data and this bound cannot be satisfied, optimization problem becomes infeasible. Therefore your error.

As a matter of fact, I looped from 1. to 0.1 (decreasing in decimal units) and still got the error, then just tried with 0.01 and no complaints arose. But of course, you should check the results of fitting your model with that value, check if accuracy is acceptable on predictions.

Update: actually I was curious and splitted your dataset to validate, output was 69% accuracy (also I think your training set might be very little)

Just for reproducibility purposes, here, the quick test I performed:

from sklearn import svm
import numpy as np 
from sklearn.cross_validation import train_test_split
from sklearn.metrics import accuracy_score

raw_data1 = open("Dataset.csv")
raw_data2 = open("Result.csv")
dataset1 = np.loadtxt(raw_data1,delimiter=",")
result1 = np.loadtxt(raw_data2,delimiter=",")

clf = svm.NuSVC(kernel='rbf',nu=0.01)
X_train, X_test, y_train, y_test = train_test_split(dataset1,result1, test_size=0.25, random_state=42)
clf.fit(X_train,y_train)
y_pred = clf.predict(X_test)
accuracy_score(y_test, y_pred, normalize=True, sample_weight=None)

Upvotes: 2

Related Questions