Reputation: 4618
I am using oneclasssvm as follows.
from sklearn.svm import OneClassSVM
clf = OneClassSVM(random_state=42)
clf.fit(X)
y_pred_train = clf.predict(X)
print(len(np.where(y_pred_train == -1)[0]))
However, I get more than 50% of my data as outliers. I would like to know if there is a way to reduce the numebr of outliers in one class svm.
I tried contamination
. However, it seems like oneclasssvm does not support contamination.
Is there any other approach that I can use?
I am happy to provide more details if needed.
Upvotes: 0
Views: 1150
Reputation: 1878
You can control how many data points in your training data are labeled as outliers by controlling the nu
parameter of OneClassSVM.
From API docs, nu is, An upper bound on the fraction of training errors and a lower bound of the fraction of support vectors. Should be in the interval (0, 1]. By default 0.5 will be taken.
I suggest that you have a labeled validation set and then tune your SVM hyperparameters, like nu
. kernel
etc. for best performance on the labeled validation set.
Upvotes: 2
Reputation: 1105
I'd be interested in understanding the variance, dimensionality, number of sample points you are using but my first suggestion would be to try:
clf = OneClassSVM(random_state=42, gamma='scale')
From the Docs
Current default is ‘auto’ which uses 1 / n_features, if gamma='scale' is passed then it uses 1 / (n_features * X.var()) as value of gamma. The current default of gamma, ‘auto’, will change to ‘scale’ in version 0.22. ‘auto_deprecated’, a deprecated version of ‘auto’ is used as a default indicating that no explicit value of gamma was passed.
Upvotes: 1