Reputation: 151
Two questions about using libsvm in python:
I use a simple example considering 4 training points (depicted by *) in a 2D space:
*----*
| |
| |
*----*
I train the SVM with the C_SVC formulation and a linear kernel, I classify the 4 points in two labels [-1, +1].
For example, when I set the training points like this, it should find a separating hyperplane.
{-1}----{+1}
| |
| |
{-1}----{+1}
But with this nonlinear problem, it should not been able to find a separating hyperplane (because of the linear kernel).
{+1}----{-1}
| |
| |
{-1}----{+1}
And I would like to be able to detect this case.
Sample code for the 2nd example:
from svmutil import *
import numpy as np
y = [1, -1, 1, -1]
x = [{1:-1, 2 :1}, {1:-1, 2:-1}, {1:1, 2:-1}, {1:1, 2:1}]
prob = svm_problem(y, x)
param = svm_parameter()
param.kernel_type = LINEAR
param.C = 10
m = svm_train(prob, param)
Sample output:
optimization finished, #iter = 21
nu = 1.000000
obj = -40.000000, rho = 0.000000
nSV = 4, nBSV = 4
Total nSV = 4
Upvotes: 1
Views: 803
Reputation: 40169
Run cross validation for a exponential grid of C as explained in the libsvm guide on a linear kernel SVM. If the training set accuracy can never get close to 100% that means that the linear model is too biased for the data which in turn means that the linear assumption is false (the data is not linearly separable).
BTW. the testing set accuracy is the real evaluation of the generalization ability of the model but it measures the sum of the bias and variance hence cannot be used directly to measure the bias only. The difference between the training and testing sets accuracies measures the variance or overfitting of the model. More information on error analysis can be found in this blog post summarizing practical tips and tricks from the ml-class online class.
Upvotes: 2