Stergios
Stergios

Reputation: 3196

Unstable behavior of OneClassSVM by changing 'nu'

In the example above, I'm using my dataset to identify outliers. After making slight changes to the nu parameter, there is a huge difference in the number of anomalies identified.

One class SVM

Could this be just a particularity of the dataset? Or a bug in scikit-learn?

P.S. Unfortunately I cannot share the dataset.

Upvotes: 2

Views: 607

Answers (2)

albertcthomas
albertcthomas

Reputation: 21

If you decrease the value of the tol parameter of the OneClassSVM the result is better although not completely as expected for low values of nu.

import numpy as np
from sklearn.svm import OneClassSVM

import matplotlib.pyplot as plt

X = np.random.rand(100, 1)
nus = np.geomspace(0.0001, 0.5, num=100)

outlier_fraction = np.zeros(len(nus))
for i, nu in enumerate(nus):
    outlier_fraction[i] = (OneClassSVM(nu=nu, tol=1e-12).fit_predict(X) == -1).mean()

plt.plot(nus, outlier_fraction)
plt.xlabel('nu')
plt.ylabel('Outlier fraction')
plt.show()

nu_small_tol

With the default tol you obtain the following enter image description here

Upvotes: 2

Ray Bell
Ray Bell

Reputation: 1618

NOTE: not an answer. Offering a MCVE.

I also recently came across this. I would like to understand the inflection point at the low values

import numpy as np
import pandas as pd
from sklearn.svm import OneClassSVM

X = np.random.rand(100, 1)

nu = np.geomspace(0.0001, 1, num=100)
df = pd.DataFrame(data={'nu': nu})

for i in range(0, len(X)):
    df.loc[i, 'anom_count'] = (OneClassSVM(nu=df.loc[i, 'nu']).fit_predict(X) == -1).sum()

df.set_index('nu').plot();

enter image description here

df.set_index('nu').plot(xlim=(0, 0.2));

enter image description here

df.anom_count.min() # 3
df.anom_count.idxmin() # 62
df.loc[df.anom_count.idxmin(), 'nu'] # 0.031

Upvotes: 1

Related Questions