Lostsoul
Lostsoul

Reputation: 26037

sklearn's imputer reducing columns?

I'm wondering if anyone can help explain a weird behavior I'm seeing with sklearn's interativeImputer.

imputer = IterativeImputer(max_iter=100)
print("dateframe shape ", dataframe.shape)
tempDF = imputer.fit_transform(dataframe)
print("imputer shape: ", tempDF.shape)

I assume the shape would stay the same but the results are:

dateframe shape  (1978, 100)
imputer shape:  (1978, 91)

I found this error when I was converting the numpy array sklearn returns back into a pandas df

tempDF = pd.DataFrame(tempDF, index=dataframe.index, columns=dataframe.columns)

Any suggestions of what I can do to keep the original shape when using imputer?

Upvotes: 1

Views: 434

Answers (1)

PV8
PV8

Reputation: 6270

It happenes probably because some of your coloms have everywhere NaN, I created a small example for you followed the docs:

from sklearn.experimental import enable_iterative_imputer
from sklearn.impute import IterativeImputer
imp_mean = IterativeImputer(random_state=0)
import numpy as np
imp_mean.fit([[7, 2, np.nan], [4, np.nan, np.nan], [10, 5, np.nan]])
X = [[7, 2, np.nan], [4, np.nan, np.nan], [10, 5, np.nan]]
imp_mean.transform(X)
array([[ 7.      ,  2.      ],
       [ 4.      , -0.999998],
       [10.      ,  5.      ]])

so if all is Nanthe IterativeImputer has no `idea how to fit and transform it. The original example in the docs end ups with (3,3) shape.

Upvotes: 1

Related Questions