Reputation: 861

SKlearn's KFold generates NaN values

I have a feature frame with just 1 column, named X which contains float values and a label vector y with binary classes (1 or 0).

When I do X.isnull().sum(), it outputs 0 and same is the case with the label vector. But when I try to index X inside the KFold loop like this:

acc = []
for train_ind, test_ind in kf.split(X):
    X_train, X_test = X[train_ind], X[test_ind]
    y_train, y_test = y[train_ind], y[test_ind]

    dtree.fit(X_train, y_train)
    acc.append(accuracy_score(y_test, dtree.predict(X_test)))
    print(acc)
print(np.array(acc).mean())

it outputs an error saying Input contains NaN, infinity or a value too large for dtype('float32'). and when I do X_train.insull().sum(), it outputs 2. That means it is generating 2 nan values on indexing. Is my indexing correct for the feature and label vectors?

Upvotes: 2