Holden Caulfield
Holden Caulfield

Reputation: 55

sklearn TimeSeriesSplit Error: KeyError: '[ 0 1 2 ...] not in index'

I want to use TimeSeriesSplit from sklearn on the following dataframe to predict sum: dataframe

So to prepare X and y I do the following:

X = df.drop(['sum'],axis=1)
y = df['sum']

and then feed these two to:

for train_index, test_index in tscv.split(X):
X_train01, X_test01 = X[train_index], X[test_index]
y_train01, y_test01 = y[train_index], y[test_index]

by doing so, I get the following error:

KeyError: '[ 0  1  2 ...] not in index'

Here X is a dataframe, and apparently this cause the error, because if I convert X to an array as following:

X = X.values

Then it will work. However, for later evaluation of the model I need X as a dataframe. Is there any way that I can keep X as a dataframe and feed it to tscv without converting it to an array?

Upvotes: 1

Views: 1686

Answers (1)

Vivek Kumar
Vivek Kumar

Reputation: 36619

As @Jarad rightly said, if you have updated version of pandas, it will not automatically switch to integer based indexing as was possible in previous versions. You need to explicitly use .iloc for integer based slicing.

for train_index, test_index in tscv.split(X):
    X_train01, X_test01 = X.iloc[train_index], X.iloc[test_index]
    y_train01, y_test01 = y.iloc[train_index], y.iloc[test_index]

See https://pandas.pydata.org/pandas-docs/stable/indexing.html

Upvotes: 8

Related Questions