plgent
plgent

Reputation: 109

"None of [Int64Index , 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25,\n ...... dtype='int64')] are in the [columns]"

I'm currently trying to perform a KFold on my pandas data frame that reads a pandas file from csv. Unfortunately i'm getting the error:

"None of [Int64Index , 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25,\n ...... dtype='int64')] are in the [columns]"

Here is my code:

def getSlicesOfData(read_csv):
    slice_training_data = read_csv[["player", "0", "1", "2", "3", "4", "5", "6", "7", "8"]]
    slice_prediction_data = read_csv[["best_move"]]
    return (slice_training_data, slice_prediction_data)

def getKFold(data_sliced):
    kf = KFold(n_splits=10, random_state=None, shuffle=False)
    return kf.split(data_sliced[0],data_sliced[1])
    #return TimeSeriesSplit(n_splits=10, max_train_size=9)

if __name__ == "__main__":
    read_csv = pd.read_csv('100games.csv')
    data_slice = getSlicesOfData(read_csv)
    for train_index, test_index in getKFold(data_slice):
        x_train, x_test = data_slice[0][train_index], data_slice[0][test_index]
        y_train, y_test = data_slice[1][train_index],data_slice[1][test_index]

what if anything am i doing wrong when attempting to get training data with:

x_train, x_test = data_slice[0][train_index], data_slice[0][test_index]
            y_train, y_test = data_slice[1][train_index],data_slice[1][test_index]

Upvotes: 4

Views: 7701

Answers (4)

I mean... this seems like u have a fair idea of what you aim to accomplish but... with line such as [["best move"]]

perhapos calculate from 3 best moves and give a weighted chance for each to be selected and executed.

10 splits no random no shuffle...

like with 6 splits 1.5 random and a 2 shuffle it may perform better because... if your opponent has also taken these shortcuts but managed to get her running.\

in life and in circuitry, when you take the risk of going off the path a bit, your opponent expects you to use the typical strategies. Don't.

im no coding expert, but from the fundamentals i am aware of.. this just isnt quite enough. its a computer, you must be extremely explicit with your intructions

Upvotes: -2

dixhom
dixhom

Reputation: 3035

Try iloc.

x_train, x_test = data_slice[0].iloc[train_index], data_slice[0].iloc[test_index]
y_train, y_test = data_slice[1].iloc[train_index], data_slice[1].iloc[test_index]

Upvotes: 2

seralouk
seralouk

Reputation: 33127

Convert to numpy using: data_slice[0].values[train_index]

Try:

if __name__ == "__main__":
    read_csv = pd.read_csv('100games.csv')
    data_slice = getSlicesOfData(read_csv)
    for train_index, test_index in getKFold(data_slice):
        x_train, x_test = data_slice[0].values[train_index], data_slice[0].values[test_index]
        y_train, y_test = data_slice[1].values[train_index], data_slice[1].values[test_index]

See also: https://stackoverflow.com/a/51091177/5025009

Upvotes: 5

ssazally
ssazally

Reputation: 61

You're trying to perform K-fold on pandas data frame and that's where the problem lies. Try to change the data structure from pandas to numpy instead and re-run back the code. At the end, you might want to change back your data structure from numpy to pandas.

Upvotes: 6

Related Questions