HilaD
HilaD

Reputation: 881

sklearn kfold returning wrong indexes in python

I am using kfold function from sklearn package in python on a df (data frame) with non-contious row indexes.

this is the code:

kFold = KFold(n_splits=10, shuffle=True, random_state=None)
for train_index, test_index in kFold.split(dfNARemove):...

I get some train_index or test_index that doesn't exist in my df.

what can I do?

Upvotes: 13

Views: 7198

Answers (1)

Eduard Ilyasov
Eduard Ilyasov

Reputation: 3308

kFold iterator yields to you positional indices of train and validation objects of DataFrame, not their non-continuous indices. You can access your train and validation objects by using .iloc pandas method:

kFold = KFold(n_splits=10, shuffle=True, random_state=None)
for train_index, test_index in kFold.split(dfNARemove):
    train_data = dfNARemove.iloc[train_index]
    test_data = dfNARemove.iloc[test_index]

If you want to know, which non-continuous indices used for train_index and test_index on each fold, you can do following:

non_continuous_train_index = dfNARemove.index[train_index]
non_continuous_test_index = dfNARemove.index[test_index]

Upvotes: 18

Related Questions