Reputation: 3151
I am trying to SimpleImpute a pandas dataframe column using sklearns SimpleImputer as follows:
imputer = SimpleImputer(missing_values=np.nan, strategy='mean')
imputer.fit(completeDF_encoded)
FDS1 = imp_mean.transform(completeDF_encoded)
FDS1
But transform is returning an array instead of a dataframe with all NaNs replaced as follows:
array([[1.0000e+00, 1.8800e+02, 0.0000e+00, ..., 0.0000e+00, 1.0000e+00,
0.0000e+00],
[2.0000e+00, 2.0900e+02, 0.0000e+00, ..., 1.0000e+00, 0.0000e+00,
1.0000e+00],
[3.0000e+00, 2.5700e+02, 0.0000e+00, ..., 1.0000e+00, 0.0000e+00,
1.0000e+00],
...,
[7.9998e+04, 2.5600e+02, 1.0000e+00, ..., 0.0000e+00, 1.0000e+00,
0.0000e+00],
[7.9999e+04, 2.5600e+02, 1.0000e+00, ..., 1.0000e+00, 0.0000e+00,
0.0000e+00],
[8.0000e+04, 2.5600e+02, 1.0000e+00, ..., 1.0000e+00, 0.0000e+00,
0.0000e+00]])
How do I get back the imputed dataframe instead of numpy array?
Upvotes: 0
Views: 126
Reputation: 6564
I am using the following code to impute with the column mean:
for col in cols:
df[col].fillna(df[col].mean(), inplace = True)
cols is a series of the columns you wish to impute, e.g.:
cols = ['col1', 'col2', 'col3']
Upvotes: 1