How to reduce the number of outliers in OneClassSVM in sklearn in python?

Question

I am using oneclasssvm as follows.

from sklearn.svm import OneClassSVM

clf = OneClassSVM(random_state=42)
clf.fit(X)
y_pred_train = clf.predict(X)

print(len(np.where(y_pred_train == -1)[0]))

However, I get more than 50% of my data as outliers. I would like to know if there is a way to reduce the numebr of outliers in one class svm.

I tried contamination. However, it seems like oneclasssvm does not support contamination.

Is there any other approach that I can use?

I am happy to provide more details if needed.

pypypy · Accepted Answer

I'd be interested in understanding the variance, dimensionality, number of sample points you are using but my first suggestion would be to try:

clf = OneClassSVM(random_state=42, gamma='scale')

From the Docs

Current default is ‘auto’ which uses 1 / n_features, if gamma='scale' is passed then it uses 1 / (n_features * X.var()) as value of gamma. The current default of gamma, ‘auto’, will change to ‘scale’ in version 0.22. ‘auto_deprecated’, a deprecated version of ‘auto’ is used as a default indicating that no explicit value of gamma was passed.

How to reduce the number of outliers in OneClassSVM in sklearn in python?

Answers (2)

Related Questions