Reputation: 1
First, as a non-English speaker, I am using a translator to solve my problem. I ask for your understanding if the sentence is awkward and difficult to read.
I try to learn data through Kfold cross validation. However, continuous errors occur in the process of dividing train data for kfold. Following code is my data set.
df_test = df_data.iloc[50001:, :] #Test set
df_use = df_data.iloc[0:50000, :] #Training set
x_test = df_test.drop(['upgraded'], axis = 1)
y_test = df_test['upgraded']
x = df_use.drop(['upgraded'], axis = 1)
y = df_use['upgraded']
And every time I try to split train data and validation data, error occurs.
for train_ix, val_ix in kfold.split(x):
trainX, trainy = x[train_ix], y[train_ix]
valX, valy = x[val_ix], y[val_ix]
model, val_acc = evaluate_model(trainX, trainy, valX, valy)
I'm not sure this will help, but when I use this code, trainX, trainy = x[train_ix], y[train_ix]
this error message occurs.
KeyError: "None of [Int64Index([10000, 10001, 10002, 10003, 10004, 10005, 10006, 10007, 10008,\n 10009,\n ...\n 49990, 49991, 49992, 49993, 49994, 49995, 49996, 49997, 49998,\n 49999],\n dtype='int64', length=40000)] are in the [columns]"
So I switched that code like this.
for train_ix, val_ix in kfold.split(x):
trainX, valX = x.iloc[train_ix], x.iloc[val_ix]
trainy, valy = y.iloc[train_ix], y.iloc[val_ix]
model, val_acc = evaluate_model(trainX, trainy, valX, valy)
And this time, model, val_acc = evaluate_model(trainX, trainy, valX, valy)
this code gets the error.
IndexError: index -9223372036854775808 is out of bounds for axis 1 with size 2
So I tried this code as well. (I sliced df_use with train_test_split.) Same index error occurs.
inputs = np.concatenate((x_train, x_val), axis=0)
targets = np.concatenate((y_train, y_val), axis=0)
I want to split and put the data in the right way so that the kfold cross validation model recognizes my data and can run the model. It would be very helpful if someone helped.
Upvotes: 0
Views: 2501
Reputation: 583
You can try the following
from sklearn.model_selection import KFold
df_test = df_data.iloc[50001:, :] #Test set
df_use = df_data.iloc[0:50000, :] #Training set
y_test = df_test['upgraded']
x_test = df_test.drop(['upgraded'], axis = 1)
y = df_use['upgraded']
x = df_use.drop(['upgraded'], axis = 1)
kf = KFold(n_splits=2)
for train_index, test_index in kf.split(x):
trainX, valX = x.take(list(train_index),axis=0), x.take(list(test_index),axis=0)
trainy, valy = y.take(list(train_index),axis=0), y.take(list(test_index),axis=0)
model, val_acc = evaluate_model(trainX, trainy, valX, valy)
I hope this works. Please comment below if any issue faced.
Upvotes: 2