Unstable behavior of OneClassSVM by changing 'nu'

In the example above, I'm using my dataset to identify outliers. After making slight changes to the nu parameter, there is a huge difference in the number of anomalies identified.

Could this be just a particularity of the dataset? Or a bug in scikit-learn?

P.S. Unfortunately I cannot share the dataset.

Upvotes: 2

Answers (2)

albertcthomas

Reputation: 21

If you decrease the value of the tol parameter of the OneClassSVM the result is better although not completely as expected for low values of nu.

import numpy as np
from sklearn.svm import OneClassSVM

import matplotlib.pyplot as plt

X = np.random.rand(100, 1)
nus = np.geomspace(0.0001, 0.5, num=100)

outlier_fraction = np.zeros(len(nus))
for i, nu in enumerate(nus):
    outlier_fraction[i] = (OneClassSVM(nu=nu, tol=1e-12).fit_predict(X) == -1).mean()

plt.plot(nus, outlier_fraction)
plt.xlabel('nu')
plt.ylabel('Outlier fraction')
plt.show()

With the default tol you obtain the following

Upvotes: 2

Ray Bell

Reputation: 1618

NOTE: not an answer. Offering a MCVE.

I also recently came across this. I would like to understand the inflection point at the low values

import numpy as np
import pandas as pd
from sklearn.svm import OneClassSVM

X = np.random.rand(100, 1)

nu = np.geomspace(0.0001, 1, num=100)
df = pd.DataFrame(data={'nu': nu})

for i in range(0, len(X)):
    df.loc[i, 'anom_count'] = (OneClassSVM(nu=df.loc[i, 'nu']).fit_predict(X) == -1).sum()

df.set_index('nu').plot();

df.set_index('nu').plot(xlim=(0, 0.2));

df.anom_count.min() # 3
df.anom_count.idxmin() # 62
df.loc[df.anom_count.idxmin(), 'nu'] # 0.031

Upvotes: 1

Unstable behavior of OneClassSVM by changing &#39;nu&#39;

Answers (2)

Related Questions

Unstable behavior of OneClassSVM by changing 'nu'