Reputation: 5494
I am following the example shown in http://scikit-learn.org/stable/auto_examples/svm/plot_oneclass.html#example-svm-plot-oneclass-py, where a one class SVM is used for anomaly detection. Now, this may be a notation unique to scikit-learn, but I couldn't find an explanation of how to use the parameter nu given to the OneClassSVM constructor.
In http://scikit-learn.org/stable/modules/svm.html#nusvc, it is stated that the parameter nu is a reparametrization of the parameter C (which is the regularization parameter which I am familiar with) - but doesn't state how to perform that reparameterization.
Both a formula and an intuition will be much appreciated.
Thanks!
Upvotes: 33
Views: 26672
Reputation: 3217
nu in support vectors is a hyper parameter.
In c-SVM if we want to query a point xq then
∑αi.yi.xiT.xq+b for i=1 to n
As we know αi for suport vectors greater than zero and for non-support vecotrs αi=0.
so only support vectors is matter for calcuating f(xq), But in regular C-SVM we don't have control on the no.of support vectors so here comes the nu-SVM
nu :
upper bound for no.of error
lower bound for no.of support vectors
nu always lies between 0<= nu <= 1.
lets say nu=0.1 and n=10,000
1.we want at most 10% errors => 1000 error points
2.we get support vectors >= 1% =>we get more than 1000 support vectors.
Upvotes: -1
Reputation: 5267
The problem with the parameter C is:
It is therefore hard to choose correctly and one has to resort to cross validation or direct experimentation to find a suitable value.
In response Schölkopf et al. reformulated SVM to take a new regularization parameter nu. This parameter is:
The parameter nu is an upper bound on the fraction of margin errors and a lower bound of the fraction of support vectors relative to the total number of training examples. For example, if you set it to 0.05 you are guaranteed to find at most 5% of your training examples being misclassified (at the cost of a small margin, though) and at least 5% of your training examples being support vectors.
The relation between C and nu is governed by the following formula:
nu = A+B/C
A and B are constants which are unfortunately not that easy to calculate.
The takeaway message is that C and nu SVM are equivalent regarding their classification power. The regularization in terms of nu is easier to interpret compared to C, but the nu SVM is usually harder to optimize and runtime doesn't scale as well as the C variant with number of input samples.
More details (including formulas for A and B) can be found here: Chang CC, Lin CJ - "Training nu-support vector classifiers: theory and algorithms"
Upvotes: 58