Reputation: 129
I have a question about Calculating anomaly score for Anomaly detection using One-Class SVM. My question is that: How can I calculate it using decision_function(X), just the way I calculate anomaly score in Isolation forest? Thanks a lot,
Upvotes: 2
Views: 3876
Reputation: 597
1.Isolation Forest method
2.Local Outlier Factor method
3.Elliptical Envelope method
4.One-Class SVM method
5.The DBSCAN method
6.Gaussian Mixture method
7.K-means method
8.Kernel Density method
# https://www.datatechnotes.com/2020/04/anomaly-detection-with-one-class-svm.html
from sklearn.svm import OneClassSVM
from sklearn.datasets import make_blobs
from numpy import quantile, where, random
import matplotlib.pyplot as plt
random.seed(13)
x, _ = make_blobs(n_samples=200, centers=1, cluster_std=.3, center_box=(8, 8))
plt.scatter(x[:,0], x[:,1])
plt.show()
svm = OneClassSVM(kernel='rbf', gamma=0.001, nu=0.02)
svm.fit_predict(x)
pred = svm.predict(x)
# calculate the outliers according to the score value of each element.
scores = svm.score_samples(x)
# the threshold value from the scores by using the quantile function. Here, we'll get the lowest 3 percent of score values as the anomalies.
thresh = quantile(scores, 0.03)
print(thresh)
# extract the anomalies by comparing the threshold value and identify the values of elements.
index = where(scores<=thresh)
values = x[index]
# visualize the results in a plot by highlighting the anomalies with a color.
plt.scatter(x[:,0], x[:,1])
plt.scatter(values[:,0], values[:,1], color='r')
plt.show()
############## compare with decision function
df= svm.decision_function(x)
print(svm.decision_function(x))
# Signed distance to the separating hyperplane. Signed distance is positive for an inlier and negative for an outlier.
index = where(df<0)
values = x[index]
# visualize the results in a plot by highlighting the anomalies with a color.
plt.scatter(x[:,0], x[:,1])
plt.scatter(values[:,0], values[:,1], color='r')
plt.show()
p.s. Minimum Covariance Determinant (MCD) method is a "highly robust estimator of multivariate location"... or in docs with example
p.p.s. Novelty Detection
Upvotes: 1
Reputation: 1136
This is a know issue by default scikit implementation does NOT provide an anomally score.
A way to tackle the issue is to use the decision_function (https://scikit-learn.org/stable/modules/generated/sklearn.svm.OneClassSVM.html#sklearn.svm.OneClassSVM.decision_function) but with the following way :
anomaly_metric[i] = max_value_decision_fn - decision_fn[i]
where i is the i-th data point.
How to calculate AUC for One Class SVM in python?
Upvotes: 1
Reputation: 16966
Yes, you have to use decision_function()
as the measure of anomaly score in one class SVM.
Have a look at this example, you might get better understanding.
clf.decision_function(X_test)
# returns the signed distance to the separating hyperplane.
# Signed distance is positive for an inlier and negative for an outlier.
Upvotes: 1
Reputation: 4939
In Isolation Forests, anomaly score is a measure of the deviation of average length of the path required to single out a particular observation from the average length of path required to single out a "normal" observation
The average here is taken over all the different trees that are used. Since SVM is not an ensemble method - this notion of anomaly score does not directly apply.
One way, and I don't know how statistically/scientifically sound this is, of measuring an anomaly score is to build multiple SVM classifiers based on a subset of predictors. You could then use the percentage of times a particular point is classified as an outlier as a proxy for an anomaly score.
Upvotes: 2