Reputation: 2164
What is the cost parameter C mean in SVM? I mean, if C is large, does it mean "I cannot tolerate wrong classification"?
And how could I determine the range and step size when finding best parameters in experiment?
By the way, what is the standard to decide which parameter is better? The number of errors from cross-validation or the number of support vectors we get from SVM?
Upvotes: 4
Views: 17243
Reputation: 175
Take it this way. C parameter in SVM is Penalty parameter of the error term. You can consider it as the degree of correct classification that the algorithm has to meet or the degree of optimization the the SVM has to meet.
For greater values of C, there is no way that SVM optimizer can misclassify any single point. Yes, as you said, the tolerance of the SVM optimizer is high for higher values of C . But for Smaller C, SVM optimizer is allowed at least some degree of freedom so as to meet the best hyperplane !
SVC(C=1.0, kernel='rbf', degree=3, gamma='auto')
--> Low Tolerant RBF Kernels
SVC(C=1000.0,kernel='linear',degree=3,gamma='auto')
-->High Tolerant linear Kernels
Refer : http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC
Upvotes: 3
Reputation: 12152
What is C?
The optimization problem SVM training solves has two terms:
C is just the balance between the importance of these to terms. If C is high you're giving a lot of weight to (2), if C is low you're giving a lot of weight to (1).
If I just want accurate results why don't I just set C really high?
Term (1) guards against over fitting (being really good at classifying training data but very bad at classifying unseen testing data)
Ok I just want accurate results why don't I just set C really low then?
Term (2) makes sure that the training optimization pays attention to the training data, you don't just want "simple" (in the L2 sense) weights, you want simple weights that classify the training data correctly.
SUMMARY:
Training an SVM is a balance of two terms. C is the relative importance of the loss term with respect to the regularization term.
Upvotes: 5
Reputation: 2164
-C means how you can tolerate the classification error (slack variable). To find a better model, I guess we have to just set a large threshold and do cross-validation.
Upvotes: 0