Reputation: 109
I'm currently trying to perform a KFold on my pandas data frame that reads a pandas file from csv. Unfortunately i'm getting the error:
"None of [Int64Index , 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25,\n ...... dtype='int64')] are in the [columns]"
Here is my code:
def getSlicesOfData(read_csv):
slice_training_data = read_csv[["player", "0", "1", "2", "3", "4", "5", "6", "7", "8"]]
slice_prediction_data = read_csv[["best_move"]]
return (slice_training_data, slice_prediction_data)
def getKFold(data_sliced):
kf = KFold(n_splits=10, random_state=None, shuffle=False)
return kf.split(data_sliced[0],data_sliced[1])
#return TimeSeriesSplit(n_splits=10, max_train_size=9)
if __name__ == "__main__":
read_csv = pd.read_csv('100games.csv')
data_slice = getSlicesOfData(read_csv)
for train_index, test_index in getKFold(data_slice):
x_train, x_test = data_slice[0][train_index], data_slice[0][test_index]
y_train, y_test = data_slice[1][train_index],data_slice[1][test_index]
what if anything am i doing wrong when attempting to get training data with:
x_train, x_test = data_slice[0][train_index], data_slice[0][test_index]
y_train, y_test = data_slice[1][train_index],data_slice[1][test_index]
Upvotes: 4
Views: 7701
Reputation: 1
I mean... this seems like u have a fair idea of what you aim to accomplish but... with line such as [["best move"]]
perhapos calculate from 3 best moves and give a weighted chance for each to be selected and executed.
10 splits no random no shuffle...
like with 6 splits 1.5 random and a 2 shuffle it may perform better because... if your opponent has also taken these shortcuts but managed to get her running.\
in life and in circuitry, when you take the risk of going off the path a bit, your opponent expects you to use the typical strategies. Don't.
im no coding expert, but from the fundamentals i am aware of.. this just isnt quite enough. its a computer, you must be extremely explicit with your intructions
Upvotes: -2
Reputation: 3035
Try iloc
.
x_train, x_test = data_slice[0].iloc[train_index], data_slice[0].iloc[test_index]
y_train, y_test = data_slice[1].iloc[train_index], data_slice[1].iloc[test_index]
Upvotes: 2
Reputation: 33127
Convert to numpy using: data_slice[0].values[train_index]
Try:
if __name__ == "__main__":
read_csv = pd.read_csv('100games.csv')
data_slice = getSlicesOfData(read_csv)
for train_index, test_index in getKFold(data_slice):
x_train, x_test = data_slice[0].values[train_index], data_slice[0].values[test_index]
y_train, y_test = data_slice[1].values[train_index], data_slice[1].values[test_index]
See also: https://stackoverflow.com/a/51091177/5025009
Upvotes: 5
Reputation: 61
You're trying to perform K-fold on pandas data frame and that's where the problem lies. Try to change the data structure from pandas to numpy instead and re-run back the code. At the end, you might want to change back your data structure from numpy to pandas.
Upvotes: 6