Ahmed Dhanani
Ahmed Dhanani

Reputation: 861

SKlearn's KFold generates NaN values

I have a feature frame with just 1 column, named X which contains float values and a label vector y with binary classes (1 or 0).

When I do X.isnull().sum(), it outputs 0 and same is the case with the label vector. But when I try to index X inside the KFold loop like this:

acc = []
for train_ind, test_ind in kf.split(X):
    X_train, X_test = X[train_ind], X[test_ind]
    y_train, y_test = y[train_ind], y[test_ind]

    dtree.fit(X_train, y_train)
    acc.append(accuracy_score(y_test, dtree.predict(X_test)))
    print(acc)
print(np.array(acc).mean())

it outputs an error saying Input contains NaN, infinity or a value too large for dtype('float32'). and when I do X_train.insull().sum(), it outputs 2. That means it is generating 2 nan values on indexing. Is my indexing correct for the feature and label vectors?

Upvotes: 2

Views: 1418

Answers (2)

ENEG
ENEG

Reputation: 1

According to what @Utkarsh Sah mentioned, the problem is some missing indices in y data frame. Reset the indices before using the y dataframe:

y=y.reset_index(drop=True)

Upvotes: -1

Utkarsh Sah
Utkarsh Sah

Reputation: 301

Not sure if this is the case but I believe some indices are missing in y dataframe. Try resetting it before running the loop:

y.reset_index(drop=True)

Upvotes: 2

Related Questions