EmJ
EmJ

Reputation: 4618

How to reduce the number of outliers in OneClassSVM in sklearn in python?

I am using oneclasssvm as follows.

from sklearn.svm import OneClassSVM

clf = OneClassSVM(random_state=42)
clf.fit(X)
y_pred_train = clf.predict(X)

print(len(np.where(y_pred_train == -1)[0]))

However, I get more than 50% of my data as outliers. I would like to know if there is a way to reduce the numebr of outliers in one class svm.

I tried contamination. However, it seems like oneclasssvm does not support contamination.

Is there any other approach that I can use?

I am happy to provide more details if needed.

Upvotes: 0

Views: 1150

Answers (2)

dgumo
dgumo

Reputation: 1878

You can control how many data points in your training data are labeled as outliers by controlling the nu parameter of OneClassSVM.

From API docs, nu is, An upper bound on the fraction of training errors and a lower bound of the fraction of support vectors. Should be in the interval (0, 1]. By default 0.5 will be taken.

I suggest that you have a labeled validation set and then tune your SVM hyperparameters, like nu. kernel etc. for best performance on the labeled validation set.

Upvotes: 2

pypypy
pypypy

Reputation: 1105

I'd be interested in understanding the variance, dimensionality, number of sample points you are using but my first suggestion would be to try:

clf = OneClassSVM(random_state=42, gamma='scale')

From the Docs

Current default is ‘auto’ which uses 1 / n_features, if gamma='scale' is passed then it uses 1 / (n_features * X.var()) as value of gamma. The current default of gamma, ‘auto’, will change to ‘scale’ in version 0.22. ‘auto_deprecated’, a deprecated version of ‘auto’ is used as a default indicating that no explicit value of gamma was passed.

Upvotes: 1

Related Questions