Aniruddh
Aniruddh

Reputation: 188

ValueError: Input contains NaN, infinity or a value too large for dtype('float64')

I am practising on a loan prediction practise problem and trying to fill missing values in my data. I obtained the data from here. To complete this problem I am following this tutorial.

You can find the entire code (file name model.py) I am using and the data on GitHub.

The DataFrame looks like this:

After the last line is executed (corresponds to line 122 in the model.py file)

/home/user/.local/lib/python2.7/site-packages/numpy/lib/arraysetops.py:216: FutureWarning: numpy not_equal will not check object identity in the future. The comparison did not return the same result as suggested by the identity (`is`)) and will change.
  flag = np.concatenate(([True], aux[1:] != aux[:-1]))
/usr/local/lib/python2.7/dist-packages/sklearn/cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)
Traceback (most recent call last):
  File "model.py", line 123, in <module>
    classification_model(model, df,predictor_var,outcome_var)
  File "model.py", line 89, in classification_model
    model.fit(data[predictors],data[outcome])
  File "/usr/local/lib/python2.7/dist-packages/sklearn/linear_model/logistic.py", line 1173, in fit
    order="C")
  File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/validation.py", line 521, in check_X_y
    ensure_min_features, warn_on_dtype, estimator)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/validation.py", line 407, in check_array
    _assert_all_finite(array)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/validation.py", line 58, in _assert_all_finite
    " or a value too large for %r." % X.dtype)
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

I am getting this error because of the missing values. How do I fill these missing values?

The missing values for Self_Employed and LoanAmount is filled how do I fill the rest.Thank you for the help.

Upvotes: 1

Views: 3183

Answers (1)

jezrael
jezrael

Reputation: 862481

You can use fillna:

df['Gender'].fillna('no data',inplace=True)
df['Married'].fillna('no data',inplace=True)

Or if need replace multiple columns to same value:

cols = ['Gender','Married']
df[cols] = df[cols].fillna('no data')

If need replace multiple columns is possible use dict with column names and value for replace:

df = pd.DataFrame({'Gender':['m','f',np.nan], 
                   'Married':[np.nan,'yes','no'],
                   'credit history':[1.,np.nan,0]})
print (df)
  Gender Married  credit history
0      m     NaN             1.0
1      f     yes             NaN
2    NaN      no             0.0

d = {'Gender':'no data', 'Married':'no data', 'credit history':0}
df = df.fillna(d)
print (df)
    Gender  Married  credit history
0        m  no data             1.0
1        f      yes             0.0
2  no data       no             0.0

Upvotes: 1

Related Questions