Khan11
Khan11

Reputation: 285

How to remove NaN when getting the error: ValueError: Input contains NaN

I am trying to scale my data using MinMaxScaler() to be between 0-1 using:

    x_scaling = x_scale.transform(x)

    print("Min:", np.min(x_scaling))
    print("Max:", np.max(x_scaling))

My trackback error message is:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-75-c862a09c2cc2> in <module>()
  --> 120   x_scaling = x_scale.transform(x)
    121 
    122     print("Min:", np.min(x_scaling))

~/anaconda3_501/lib/python3.6/site-packages/sklearn/preprocessing/data.py in transform(self, X)
    365         check_is_fitted(self, 'scale_')
    366 
--> 367         X = check_array(X, copy=self.copy, dtype=FLOAT_DTYPES)
    368 
    369         X *= self.scale_

~/anaconda3_501/lib/python3.6/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
    451                              % (array.ndim, estimator_name))
    452         if force_all_finite:
--> 453             _assert_all_finite(array)
    454 
    455     shape_repr = _shape_repr(array.shape)

~/anaconda3_501/lib/python3.6/site-packages/sklearn/utils/validation.py in _assert_all_finite(X)
     42             and not np.isfinite(X).all()):
     43         raise ValueError("Input contains NaN, infinity"
---> 44                          " or a value too large for %r." % X.dtype)
     45 
     46 

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

My data does have a NaN as I have shifted my data up by 1, my DataFrame looks like:

6  2012-01-01 07:00:00  0.022311  1.677769  6  2.963995
7  2012-01-01 08:00:00  0.014925  2.963995  7  5.062572
8  2012-01-01 09:00:00  0.096465  5.062572  8  7.065042
9  2012-01-01 10:00:00  0.284445  7.065042  9   **NaN**

If this is the issue as the error message states various possibilities, could I get help with resolving this help would be appreciated.

Upvotes: 1

Views: 2456

Answers (1)

iacob
iacob

Reputation: 24291

You want to use numpy.nanmin() and numpy.nanmax():

Return minimum of an array or minimum along an axis, ignoring any NaNs. When all-NaN slices are encountered a RuntimeWarning is raised and Nan is returned for that slice.

e.g. instead of MinMaxScaler(), create a custom scaler that ignores NaNs like so:

x_std = (x - np.nanmin(x))/(np.nanmax(x) - np.nanmin(x))
x_scaled = x_std * (max - min) + min

where min, max = feature_range.

Upvotes: 1

Related Questions