How to use TimeSeriesSplit in cv as mentioned in the documentation of scikit-learn

Question

Trying to use 10 fold TimeSeriesSplit(), but in the documentation of cross_val_score, it is given that we need to pass a cross-validation generator or an iterable.

tss = TimeSeriesSplit(max_train_size=None, n_splits=10)
l =[]
neighb = [1,3,5,7,9,11,13,12,23,19,18]
for k in neighb:
    knn = KNeighborsClassifier(n_neighbors=k, algorithm='brute')
    sc = cross_val_score(knn, X1, y1, cv=tss, scoring='accuracy')
    l.append(sc.mean())

How should I pass it after the time-series split into train and test data to cv?

TypeError                   
 Traceback (most recent call last)
 in ()
     14 for k in neighb:
     15     knn = KNeighborsClassifier(n_neighbors=k, algorithm='brute')
---> 16     sc = cross_val_score(knn, X1, y1, cv=tss, scoring='accuracy')
     17     l.append(sc.mean())
     18 ~\Anaconda3\lib\site-packages\sklearn\cross_validation.py in cross_val_score(estimator, X, y, scoring, cv, n_jobs, verbose, fit_params, pre_dispatch)
   1579                                               train, test, verbose, None,
   1580                                               fit_params)
-> 1581                       for train, test in cv)
   1582     return np.array(scores)[:, 0]
   1583 
TypeError: 'TimeSeriesSplit' object is not iterable

Vivek Kumar · Accepted Answer

Just pass tss to cv.

scores = cross_val_score(knn, X_train, y_train, cv=tss , scoring='accuracy')

No need to call tss.split().

Update: The above method is tested on scikit-learn v0.19.1 . So make sure you have the latest version. Also I am using TimeSeriesSplit from model_selection module.

Edit 1:

You are using this now:

tss = TimeSeriesSplit(n_splits=10).split(X_1)
kn = KNeighborsClassifier(n_neighbors=5, algorithm='brute') 
sc = cross_val_score(kn, X1, y1, cv=tss, scoring='accuracy')

But in the question you posted you did this:

tss = TimeSeriesSplit(n_splits=10)

See the difference between them (split() is not present). I am using this tss in the cross_val_score() without the split() as you posted in the question.

Edit 2:

Dude you are using the deprecated class. Currently you are doing this:

from sklearn.cross_validation import cross_val_score

This is wrong. You should get a warning like this:

DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.

Pay attention to that, and use the model_selection module like this:

from sklearn.model_selection import cross_val_score

Then you will not get error with my code.

How to use TimeSeriesSplit in cv as mentioned in the documentation of scikit-learn

Answers (1)

Related Questions