Emma Lim
Emma Lim

Reputation: 161

Appropriate action against "Input contains NaN, infinity or a value too large for dtype('float64')." error

I prepared a csv file for LGBM machine learning and used the following code.

X_train, X_test, y_train, y_test=train_test_split(X, y, test_size=0.2, random_state=333 )
lgbm_wrapper = LGBMClassifier(n_estimators=400)

evals = [(X_test, y_test)]
lgbm_wrapper.fit(X_train, y_train, early_stopping_rounds=100,
eval_metric="logloss", eval_set=evals, verbose=True)
preds = lgbm_wrapper.predict(X_test)
pred_proba = lgbm_wrapper.predict_proba(X_test)[:, 1]

But I face this kind of problem.

/usr/local/lib/python3.7/dist-packages/sklearn/utils/validation.py in
_assert_all_finite(X, allow_nan, msg_dtype)
104                     msg_err.format
105                     (type_err,
--> 106                      msg_dtype if msg_dtype is not None else
X.dtype)
107             )
108     # for object dtype data, we only check for NaNs (GH-13254)
ValueError: Input contains NaN, infinity or a value too large for
dtype('float64').

To solve this problem, I checked the data type of data first.

Date             object
A                float64
B                 int64
C                 int64
D                float64
E                float64
F                float64
G                float64
H                 object
dtype: object

X.dropna() was also pre-treated to eliminate NaN-related values. However, a float63 related error still occurs. I need a little help. enter image description here

My data consists like this

Upvotes: 0

Views: 442

Answers (1)

shivam13juna
shivam13juna

Reputation: 347

When using dropna, you do know that X.dropna() isn't in place right, I hope for dropping NA you did X = X.dropna(), for the indices in which you drop X drop corresponding indices in target too.

Upvotes: 1

Related Questions