Reputation: 129

Calculating anomaly score for Anomaly detection using One-Class SVM

I have a question about Calculating anomaly score for Anomaly detection using One-Class SVM. My question is that: How can I calculate it using decision_function(X), just the way I calculate anomaly score in Isolation forest? Thanks a lot,

Upvotes: 2

Answers (4)

JeeyCi

Reputation: 597

8 Anomaly Detection Methods

1.Isolation Forest method

2.Local Outlier Factor method

3.Elliptical Envelope method

4.One-Class SVM method

5.The DBSCAN method

6.Gaussian Mixture method

7.K-means method

8.Kernel Density method

# https://www.datatechnotes.com/2020/04/anomaly-detection-with-one-class-svm.html
from sklearn.svm import OneClassSVM
from sklearn.datasets import make_blobs
from numpy import quantile, where, random
import matplotlib.pyplot as plt

random.seed(13)
x, _ = make_blobs(n_samples=200, centers=1, cluster_std=.3, center_box=(8, 8))

plt.scatter(x[:,0], x[:,1])
plt.show()

svm = OneClassSVM(kernel='rbf', gamma=0.001, nu=0.02)
svm.fit_predict(x)
pred = svm.predict(x)
# calculate the outliers according to the score value of each element.
scores = svm.score_samples(x)

# the threshold value from the scores by using the quantile function. Here, we'll get the lowest 3 percent of score values as the anomalies.
thresh = quantile(scores, 0.03)
print(thresh)

# extract the anomalies by comparing the threshold value and identify the values of elements.
index = where(scores<=thresh)
values = x[index]

# visualize the results in a plot by highlighting the anomalies with a color.
plt.scatter(x[:,0], x[:,1])
plt.scatter(values[:,0], values[:,1], color='r')
plt.show()

############## compare with decision function
df= svm.decision_function(x)
print(svm.decision_function(x))

# Signed distance to the separating hyperplane. Signed distance is positive for an inlier and negative for an outlier.
index = where(df<0)
values = x[index]

# visualize the results in a plot by highlighting the anomalies with a color.
plt.scatter(x[:,0], x[:,1])
plt.scatter(values[:,0], values[:,1], color='r')
plt.show()

p.s. Minimum Covariance Determinant (MCD) method is a "highly robust estimator of multivariate location"... or in docs with example

p.p.s. Novelty Detection

Upvotes: 1

partizanos

Reputation: 1136

This is a know issue by default scikit implementation does NOT provide an anomally score.

A way to tackle the issue is to use the decision_function (https://scikit-learn.org/stable/modules/generated/sklearn.svm.OneClassSVM.html#sklearn.svm.OneClassSVM.decision_function) but with the following way :

anomaly_metric[i] = max_value_decision_fn - decision_fn[i]

where i is the i-th data point.

sources: https://activisiongamescience.github.io/2015/12/23/Unsupervised-Anomaly-Detection-SOD-vs-One-class-SVM/#sklearn-Users-Beware

How to calculate AUC for One Class SVM in python?

Upvotes: 1

Venkatachalam

Reputation: 16966

Yes, you have to use decision_function() as the measure of anomaly score in one class SVM.

Have a look at this example, you might get better understanding.

clf.decision_function(X_test)
# returns the signed distance to the separating hyperplane.
# Signed distance is positive for an inlier and negative for an outlier.

Upvotes: 1

Mortz

Reputation: 4939

In Isolation Forests, anomaly score is a measure of the deviation of average length of the path required to single out a particular observation from the average length of path required to single out a "normal" observation

The average here is taken over all the different trees that are used. Since SVM is not an ensemble method - this notion of anomaly score does not directly apply.

One way, and I don't know how statistically/scientifically sound this is, of measuring an anomaly score is to build multiple SVM classifiers based on a subset of predictors. You could then use the percentage of times a particular point is classified as an outlier as a proxy for an anomaly score.

Upvotes: 2

Calculating anomaly score for Anomaly detection using One-Class SVM

Answers (4)

Related Questions