Reputation: 285
I am trying to scale my data using MinMaxScaler()
to be between 0-1 using:
x_scaling = x_scale.transform(x)
print("Min:", np.min(x_scaling))
print("Max:", np.max(x_scaling))
My trackback error message is:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-75-c862a09c2cc2> in <module>()
--> 120 x_scaling = x_scale.transform(x)
121
122 print("Min:", np.min(x_scaling))
~/anaconda3_501/lib/python3.6/site-packages/sklearn/preprocessing/data.py in transform(self, X)
365 check_is_fitted(self, 'scale_')
366
--> 367 X = check_array(X, copy=self.copy, dtype=FLOAT_DTYPES)
368
369 X *= self.scale_
~/anaconda3_501/lib/python3.6/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
451 % (array.ndim, estimator_name))
452 if force_all_finite:
--> 453 _assert_all_finite(array)
454
455 shape_repr = _shape_repr(array.shape)
~/anaconda3_501/lib/python3.6/site-packages/sklearn/utils/validation.py in _assert_all_finite(X)
42 and not np.isfinite(X).all()):
43 raise ValueError("Input contains NaN, infinity"
---> 44 " or a value too large for %r." % X.dtype)
45
46
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
My data does have a NaN as I have shifted my data up by 1, my DataFrame looks like:
6 2012-01-01 07:00:00 0.022311 1.677769 6 2.963995
7 2012-01-01 08:00:00 0.014925 2.963995 7 5.062572
8 2012-01-01 09:00:00 0.096465 5.062572 8 7.065042
9 2012-01-01 10:00:00 0.284445 7.065042 9 **NaN**
If this is the issue as the error message states various possibilities, could I get help with resolving this help would be appreciated.
Upvotes: 1
Views: 2456
Reputation: 24291
You want to use numpy.nanmin()
and numpy.nanmax()
:
Return minimum of an array or minimum along an axis, ignoring any NaNs. When all-NaN slices are encountered a RuntimeWarning is raised and Nan is returned for that slice.
e.g. instead of MinMaxScaler()
, create a custom scaler that ignores NaNs like so:
x_std = (x - np.nanmin(x))/(np.nanmax(x) - np.nanmin(x))
x_scaled = x_std * (max - min) + min
where min, max = feature_range.
Upvotes: 1