Reputation: 1548
I would like to use cross_val_score to validate my OneClassSVM training set. Doing so results in the below error message.
Can it be that because OneClassSVM is an unsupervised algorithm and does not have a "y" vector to pass to cross_val_score, the algorithm fails?
clf = svm.OneClassSVM(nu=_nu, kernel=_kernel, gamma=_gamma, random_state=_random_state, cache_size=_cache_size) scores = cross_val_score(estimator=clf, X=X_scaled, scoring='accuracy', cv=5)
PS: I realize the "y" vector is optional in cross_val_score. But still, the error leads me to hypothesize the "y" vector causes the error.
File "/usr/local/lib/python2.7/site-packages/sklearn/model_selection/_validation.py", line 140, in cross_val_score
for train, test in cv_iter)
File "/usr/local/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 758, in __call__
while self.dispatch_one_batch(iterator):
File "/usr/local/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 608, in dispatch_one_batch
self._dispatch(tasks)
File "/usr/local/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 571, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/usr/local/lib/python2.7/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 109, in apply_async
result = ImmediateResult(func)
File "/usr/local/lib/python2.7/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 326, in __init__
self.results = batch()
File "/usr/local/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 131, in __call__
return [func(*args, **kwargs) for func, args, kwargs in self.items]
File "/usr/local/lib/python2.7/site-packages/sklearn/model_selection/_validation.py", line 260, in _fit_and_score
test_score = _score(estimator, X_test, y_test, scorer)
File "/usr/local/lib/python2.7/site-packages/sklearn/model_selection/_validation.py", line 286, in _score
score = scorer(estimator, X_test)
TypeError: __call__() takes at least 4 arguments (3 given)
Upvotes: 2
Views: 3395
Reputation: 36599
I am assuming that you are using OneClassSVM for outlier detection reason (for which it was implemented in scikit, and not for classification task)
The documentation of cross_val_score says about y
:
y : array-like, optional, default: None
The target variable to try to predict in the case of supervised learning.
See the "supervised learning" in there.
So when you do:
clf = svm.OneClassSVM(nu=_nu, kernel=_kernel, gamma=_gamma,
random_state=_random_state, cache_size=_cache_size)
scores = cross_val_score(estimator=clf, X=X_scaled, scoring='accuracy', cv=5)
You are right in your assumption that OneClassSVM
is an unsupervised model and it will not need the y
parameter. All good until now.
But you also set the scoring
parameter to "accuracy". This is where the error is coming from. When you use the string "accuracy", the default [accuracy_score
] (http://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html) metric is used which has a signature:
accuracy_score(y_true, y_pred, ... ...)
There the actual and predicted y
are required (not optional) which forces the cross_val_score
to check if y
is supplied or not and hence the error.
Hope you get my point.
Solution:
As stated in this answer here, "In one-class SVM the notion of accuracy is out of place." But if you are still intent on using the "accuracy", then you need to have the ground truth ready for the supplied data as y
. Basically y
should consist of +1 or -1 based on if the actual sample is an inlier or outlier.
Why I used +1 and -1 is because, the OneClassSVM.predict() will return the values like that:
predict(X)
Perform regression on samples in X. For an one-class model, +1 or -1 is returned.
Or else you need to find any other scoring metric, which can give you some meaningful score for your predicted X (without actual ground truth y) or devise your own scoring method to calculate the outlier detection for your data.
Feel free to ask if need any more help.
Upvotes: 5