How to split train data and validation data properly in K fold cross validation

Question

First, as a non-English speaker, I am using a translator to solve my problem. I ask for your understanding if the sentence is awkward and difficult to read.

I try to learn data through Kfold cross validation. However, continuous errors occur in the process of dividing train data for kfold. Following code is my data set.

df_test = df_data.iloc[50001:, :] #Test set
df_use = df_data.iloc[0:50000, :] #Training set
    
x_test = df_test.drop(['upgraded'], axis = 1)
y_test = df_test['upgraded']
    
x = df_use.drop(['upgraded'], axis = 1)
y = df_use['upgraded']

And every time I try to split train data and validation data, error occurs.

for train_ix, val_ix in kfold.split(x):

    trainX, trainy = x[train_ix], y[train_ix]
    valX, valy = x[val_ix], y[val_ix]


    model, val_acc = evaluate_model(trainX, trainy, valX, valy)

I'm not sure this will help, but when I use this code, trainX, trainy = x[train_ix], y[train_ix] this error message occurs.

KeyError: "None of [Int64Index([10000, 10001, 10002, 10003, 10004, 10005, 10006, 10007, 10008, 10009, ... 49990, 49991, 49992, 49993, 49994, 49995, 49996, 49997, 49998, 49999], dtype='int64', length=40000)] are in the [columns]"

So I switched that code like this.

for train_ix, val_ix in kfold.split(x):

  trainX, valX = x.iloc[train_ix], x.iloc[val_ix]
  trainy, valy = y.iloc[train_ix], y.iloc[val_ix]

model, val_acc = evaluate_model(trainX, trainy, valX, valy)

And this time, model, val_acc = evaluate_model(trainX, trainy, valX, valy) this code gets the error.

IndexError: index -9223372036854775808 is out of bounds for axis 1 with size 2

So I tried this code as well. (I sliced df_use with train_test_split.) Same index error occurs.

inputs = np.concatenate((x_train, x_val), axis=0)
targets = np.concatenate((y_train, y_val), axis=0)

I want to split and put the data in the right way so that the kfold cross validation model recognizes my data and can run the model. It would be very helpful if someone helped.

How to split train data and validation data properly in K fold cross validation

Answers (1)

Related Questions