Reputation: 41
I'm new to scikit-learn, and SVM methods in general. I've got my data set working well with scikit-learn OneClassSVM in order to detect outliers; I train the OneClassSVM using observation all of which are 'inliers' and then use predict() to generate binary inlier/outlier predictions on my testing set of data.
However to continue further with my analysis I'd like to get the probabilities associated with each new observation in my test set. E.g. The probability of being an outlier associated with each new observation. I've noticed other classification methods in scikit-learn offer the ability to pass the parameter probability=True to compute this, but OneClassSVM does not offer this. Is there an easy way to get these results?
Upvotes: 2
Views: 1460
Reputation: 36
Since version 3.31, libsvm supports probabilistic outputs for one-class SVM: https://www.csie.ntu.edu.tw/~cjlin/libsvm/#nuandone
Upvotes: 0
Reputation: 1136
It provides decision function scores which in theory is the distance from the marginal decision boundary between normal and anomales OCSVM does unsupervised classification. This means that the anomaly inside the algorithm is defined based on the distance to the origin (quoted from Scholkopf's paper from NIPS https://papers.nips.cc/paper/1999/file/8725fb777f25776ffa9076e44fcfd776-Paper.pdf).
TLDR: use
clf.decision_function(samples) * (-1)
as scores. you get a sparse distributiion of scores.
Upvotes: 1
Reputation: 396
I was searching for an answer for the same question of yours until I got to this page. Stuck for sometime, then, I went back to check the original LIBSVM package since OneClassSVM of scikit-learn is based on the implementation of LIBSVM as stated here.
At the main page of LIBSVM, they state the following for option '-b' that is used to activate returning probability output scores for some variants of SVM: -b probability_estimates: whether to train a SVC or SVR model for probability estimates, 0 or 1 (default 0) In other words, the one-class SVM which is of type SVM (neither SVC nor SVR) does not have implementation for probability estimation.
If I go and try to force this option (i.e. -b) using the command line interface of LIBSVM, for example: ./svm-train -s 2 -t 2 -b 1 heart_scale
I receive the following error message: ERROR: one-class SVM probability output not supported yet
In summary, this very desired output is not yet supported by LIBSVM and thus, scikit-learn is not offering it for the moment. I hope in near future, they activate this functionality and update the thread here.
Upvotes: 1