Reputation: 1
I'm attempting to implement cross validation on the results from my KNN classifier. I have used the following code, which returns a type error.
For context, I have already imported SciKit Learn, Numpy, and Pandas libraries.
from sklearn.cross_validation import cross_val_score, ShuffleSplit
n_samples = len(y)
knn = KNeighborsClassifier(3)
cv = ShuffleSplit(n_samples, n_iter=10, test_size=0.3, random_state=0)
test_scores = cross_val_score(knn, X, y, cv=cv)
test_scores.mean()
Returns:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-139-d8cc3ee0c29b> in <module>()
7 cv = ShuffleSplit(n_samples, n_iter=10, test_size=0.3, random_state=0)
8
9 test_scores = cross_val_score(knn, X, y, cv=cv)
10 test_scores.mean()
//anaconda/lib/python2.7/site-packages/sklearn/cross_validation.pyc in cross_val_score(estimator, X, y, scoring, cv, n_jobs, verbose, fit_params, score_func, pre_dispatch)
1150 delayed(_cross_val_score)(clone(estimator), X, y, scorer, train, test,
1151 verbose, fit_params)
1152 for train, test in cv)
1153 return np.array(scores)
1154
//anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in __call__(self, iterable)
515 try:
516 for function, args, kwargs in iterable:
517 self.dispatch(function, args, kwargs)
518
519 self.retrieve()
//anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in dispatch(self, func, args, kwargs)
310 """
311 if self._pool is None:
312 job = ImmediateApply(func, args, kwargs)
313 index = len(self._jobs)
314 if not _verbosity_filter(index, self.verbose):
//anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in __init__(self, func, args, kwargs)
134 # Don't delay the application, to avoid keeping the input
135 # arguments in memory
136 self.results = func(*args, **kwargs)
137
138 def get(self):
//anaconda/lib/python2.7/site-packages/sklearn/cross_validation.pyc in _cross_val_score(estimator, X, y, scorer, train, test, verbose, fit_params)
1056 y_test = None
1057 else:
1058 y_train = y[train]
1059 y_test = y[test]
1060 estimator.fit(X_train, y_train, **fit_params)
TypeError: only integer arrays with one element can be converted to an index
Upvotes: 0
Views: 1719
Reputation: 14377
This is an error related to pandas. Scikit learn expects numpy arrays, sparse matrices or objects that behave similarly to these.
The main issue with pandas DataFrames is due to the fact that indexing with [...] chooses columns and not lines. Line indexing in pandas is done through DataFrame.loc[...]. This is unexpected behaviour for sklearn. The error probably came from line 1058, where the code is failing to extract the train sample.
To remedy this, if your y is one DataFrame column, try converting your column to array type
y = y.values
Otherwise pandas-sklearn is possibly an option.
Upvotes: 1