Reputation: 55
I want to use TimeSeriesSplit from sklearn on the following dataframe to predict sum:
So to prepare X and y I do the following:
X = df.drop(['sum'],axis=1)
y = df['sum']
and then feed these two to:
for train_index, test_index in tscv.split(X):
X_train01, X_test01 = X[train_index], X[test_index]
y_train01, y_test01 = y[train_index], y[test_index]
by doing so, I get the following error:
KeyError: '[ 0 1 2 ...] not in index'
Here X is a dataframe, and apparently this cause the error, because if I convert X to an array as following:
X = X.values
Then it will work. However, for later evaluation of the model I need X as a dataframe. Is there any way that I can keep X as a dataframe and feed it to tscv without converting it to an array?
Upvotes: 1
Views: 1686
Reputation: 36619
As @Jarad rightly said, if you have updated version of pandas, it will not automatically switch to integer based indexing as was possible in previous versions. You need to explicitly use .iloc
for integer based slicing.
for train_index, test_index in tscv.split(X):
X_train01, X_test01 = X.iloc[train_index], X.iloc[test_index]
y_train01, y_test01 = y.iloc[train_index], y.iloc[test_index]
See https://pandas.pydata.org/pandas-docs/stable/indexing.html
Upvotes: 8