user3243499
user3243499

Reputation: 3151

How to SimpleImpute pandas dataframe?

I am trying to SimpleImpute a pandas dataframe column using sklearns SimpleImputer as follows:

imputer = SimpleImputer(missing_values=np.nan, strategy='mean')
imputer.fit(completeDF_encoded)

FDS1 = imp_mean.transform(completeDF_encoded)
FDS1

But transform is returning an array instead of a dataframe with all NaNs replaced as follows:

array([[1.0000e+00, 1.8800e+02, 0.0000e+00, ..., 0.0000e+00, 1.0000e+00,
        0.0000e+00],
       [2.0000e+00, 2.0900e+02, 0.0000e+00, ..., 1.0000e+00, 0.0000e+00,
        1.0000e+00],
       [3.0000e+00, 2.5700e+02, 0.0000e+00, ..., 1.0000e+00, 0.0000e+00,
        1.0000e+00],
       ...,
       [7.9998e+04, 2.5600e+02, 1.0000e+00, ..., 0.0000e+00, 1.0000e+00,
        0.0000e+00],
       [7.9999e+04, 2.5600e+02, 1.0000e+00, ..., 1.0000e+00, 0.0000e+00,
        0.0000e+00],
       [8.0000e+04, 2.5600e+02, 1.0000e+00, ..., 1.0000e+00, 0.0000e+00,
        0.0000e+00]])

How do I get back the imputed dataframe instead of numpy array?

Upvotes: 0

Views: 126

Answers (1)

gtomer
gtomer

Reputation: 6564

I am using the following code to impute with the column mean:

for col in cols:
    df[col].fillna(df[col].mean(), inplace = True)

cols is a series of the columns you wish to impute, e.g.:

cols = ['col1', 'col2', 'col3']

Upvotes: 1

Related Questions