Arvind Sudheer
Arvind Sudheer

Reputation: 123

Dataframe becomes a numpy array after using SimpleImputer .I want it to return a dataframe

In the below notebook , after imputing the missing values using SimpleImputer, the dataframe was converted to a numpy array, how do I make sure that it's type remains as a dataframe itself ?

import pandas as pd
df1 = pd.read_excel("dummy.xlsx")

DataFrame without imputing the values..

imp = SimpleImputer(strategy='median')
df2=imp.fit_transform(df2)
df2

enter image description here

Upvotes: 1

Views: 2226

Answers (1)

gehbiszumeis
gehbiszumeis

Reputation: 3711

The documentation of sklearn.impute.SimpleImputer.fit_transform says clearly that it will return a numpy.array:

Returns: X_newnumpy: array of shape [n_samples, n_features_new]

Transformed array.

So you cannot "make sure that it's type remains as a dataframe". However, you can of course feed the resulting numpy.array in the pandas.DataFrame() constructor

from sklearn.impute import SimpleImputer
import pandas as pd
import numpy as np

# Mocking your data
df = pd.DataFrame(np.random.rand(10,3))
df[df > 0.9] = np.nan

imp = SimpleImputer(strategy='median')

# Feeding resulting numpy array from fit_transform directly to new df2
df2 = pd.DataFrame(imp.fit_transform(df))

That's it

>>> type(df2)
pandas.core.frame.DataFrame

Upvotes: 2

Related Questions