Reputation: 861
I have a feature frame with just 1 column, named X
which contains float values and a label vector y
with binary classes (1 or 0).
When I do X.isnull().sum()
, it outputs 0
and same is the case with the label vector. But when I try to index X
inside the KFold loop like this:
acc = []
for train_ind, test_ind in kf.split(X):
X_train, X_test = X[train_ind], X[test_ind]
y_train, y_test = y[train_ind], y[test_ind]
dtree.fit(X_train, y_train)
acc.append(accuracy_score(y_test, dtree.predict(X_test)))
print(acc)
print(np.array(acc).mean())
it outputs an error saying Input contains NaN, infinity or a value too large for dtype('float32').
and when I do X_train.insull().sum()
, it outputs 2. That means it is generating 2 nan values on indexing. Is my indexing correct for the feature and label vectors?
Upvotes: 2
Views: 1418
Reputation: 1
According to what @Utkarsh Sah mentioned, the problem is some missing indices in y data frame. Reset the indices before using the y dataframe:
y=y.reset_index(drop=True)
Upvotes: -1
Reputation: 301
Not sure if this is the case but I believe some indices are missing in y dataframe. Try resetting it before running the loop:
y.reset_index(drop=True)
Upvotes: 2