Reputation: 299
have a dataframe of 323 column and 10348 row . i want to divide it using stratified k-Fold using the following code
df= pd.read_csv("path")
x=df.loc[:, ~df.columns.isin(['flag'])]
y= df['flag']
StratifiedKFold(n_splits=5, random_state=None, shuffle=False)
for train_index, test_index in skf.split(x, y):
print("TRAIN:", train_index, "TEST:", test_index)
x_train, x_test = x[train_index], x[test_index]
y_train, y_test = y[train_index], y[test_index]
but i get the following error
KeyError: "None of [Int64Index([ 0, 1, 2, 3, 4, 5, 6, 7, 8,\n 10,\n ...\n 10338, 10339, 10340, 10341, 10342, 10343, 10344, 10345, 10346,\n 10347],\n dtype='int64', length=9313)] are in the [columns]"
any one tell me why i get this error and how to fix it
Upvotes: 1
Views: 11618
Reputation: 112
you can also use df.take(indices_list,axis=0)
x_train, x_test = x.take(list(train_index),axis=0), x.take(list(test_index),axis=0)
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.take.html
Upvotes: 1
Reputation: 76
Seems like you have a data frame slicing issue rather than something wrong with StratifiedKFold itself. I crafted a df for that purpose and solved it using iloc to slice an array of indexes here:
from sklearn import model_selection
# The list of some column names in flag
flag = ["raw_sentence", "score"]
x=df.loc[:, ~df.columns.isin(flag)].copy()
y= df[flag].copy()
skf =model_selection.StratifiedKFold(n_splits=2, random_state=None, shuffle=False)
for train_index, test_index in skf.split(x, y):
print("TRAIN:", train_index, "TEST:", test_index)
x_train, x_test = x.iloc[list(train_index)], x.iloc[list(test_index)]
And train_indexes and test_indexes being nd-arrays kinda messes the work here, i convert them to the lists.
you may refer: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html
Upvotes: 4
Reputation: 1
Try changing pandas dataframe to numpy array as follow:
pd.DataFrame({"A": [1, 2], "B": [3, 4]}).to_numpy()
array([[1, 3],
[2, 4]])
Upvotes: 0