VineetChirania
VineetChirania

Reputation: 871

Calculate probability estimate P(y|x) per sample x in scikit for LinearSVC

I am training my dataset using linearsvm in scikit. Can I calculate/get the probability with which a sample is classified under a given label?

For example, using SGDClassifier(loss="log") to fit the data, enables the predict_proba method, which gives a vector of probability estimates P(y|x) per sample x:

>>> clf = SGDClassifier(loss="log").fit(X, y)
>>> clf.predict_proba([[1., 1.]])

Output:

array([[ 0.0000005,  0.9999995]])

Is there any similar function which I can use to calculate the prediction probability while using svm.LinearSVC (multi-class classification). I know there is a method decision_function to predict the confidence scores for samples in this case. But, is there any way I can calculate probability estimates for the samples using this decision function?

Upvotes: 1

Views: 2089

Answers (2)

Fred Foo
Fred Foo

Reputation: 363517

No, LinearSVC will not compute probabilities because it's not trained to do so. Use sklearn.linear_model.LogisticRegression, which uses the same algorithm as LinearSVC but with the log loss. It uses the standard logistic function for probability estimates:

1. / (1 + exp(-decision_function(X)))

(For the same reason, SGDClassifier will only output probabilities when loss="log", not using its default loss function which causes it to learn a linear SVM.)

Upvotes: 2

alko
alko

Reputation: 48307

Multi class classification is a one-vs-all classification. For a SGDClassifier, as a distance to hyperplane corresponding to to particular class is returned, probability is calculated as

clip(decision_function(X), -1, 1) + 1) / 2

Refer to code for details.

You can implement similar function, it seems being reasonable to me for LinearSVC, althrough that probably needs some justification. Refer to paper mentioned in docs

Zadrozny and Elkan, “Transforming classifier scores into multiclass probability estimates”, SIGKDD‘02, http://www.research.ibm.com/people/z/zadrozny/kdd2002-Transf.pdf

P.s. A comment from "Is there 'predict_proba' for LinearSVC?":

if you want probabilities, you should either use Logistic regression or SVC. both can predict probsbilities, but in very diferent ways.

Upvotes: 1

Related Questions