Maths12
Maths12

Reputation: 989

simpleimputer is not working with my data

all,

i have np.nans and np.infs in my data. i would like to replace these with 0's however when i do the below i get the following error:

imputer = SimpleImputer(missing_values=np.nan, strategy='constant', fill_value=0)
features_to_impute = data_fe.columns.tolist()

data_fe[features_to_impute] = pd.DataFrame(imputer.fit_transform(data_fe[features_to_impute]), 
                                           columns=features_to_impute)


ValueError: Input contains infinity or a value too large for dtype('float64').

not sure how to deal with this, does anybody know how i can go around this and also impute the infs as well?

Upvotes: 0

Views: 1031

Answers (1)

Andy L.
Andy L.

Reputation: 25249

If you want to replace both np.nan and np.inf to 0, just use np.nan_to_num

Example:

a = np.array([[1, 2, np.nan, 5],
              [-np.inf, 9,3,np.nan],
              [8, np.inf, np.nan,9]])

Out[441]:
array([[  1.,   2.,  nan,   5.],
       [-inf,   9.,   3.,  nan],
       [  8.,  inf,  nan,   9.]])

b = np.nan_to_num(a, nan=0, posinf=0, neginf=0)

Out[444]:
array([[1., 2., 0., 5.],
       [0., 9., 3., 0.],
       [8., 0., 0., 9.]])

So, in your case, just pass the selected columns of dataframe to np.nan_to_num

np.nan_to_num(data_fe[features_to_impute], nan=0, posinf=0, neginf=0)

Upvotes: 1

Related Questions