Reputation: 26037
I'm wondering if anyone can help explain a weird behavior I'm seeing with sklearn's interativeImputer.
imputer = IterativeImputer(max_iter=100)
print("dateframe shape ", dataframe.shape)
tempDF = imputer.fit_transform(dataframe)
print("imputer shape: ", tempDF.shape)
I assume the shape would stay the same but the results are:
dateframe shape (1978, 100)
imputer shape: (1978, 91)
I found this error when I was converting the numpy array sklearn returns back into a pandas df
tempDF = pd.DataFrame(tempDF, index=dataframe.index, columns=dataframe.columns)
Any suggestions of what I can do to keep the original shape when using imputer?
Upvotes: 1
Views: 434
Reputation: 6270
It happenes probably because some of your coloms have everywhere NaN
, I created a small example for you followed the docs:
from sklearn.experimental import enable_iterative_imputer
from sklearn.impute import IterativeImputer
imp_mean = IterativeImputer(random_state=0)
import numpy as np
imp_mean.fit([[7, 2, np.nan], [4, np.nan, np.nan], [10, 5, np.nan]])
X = [[7, 2, np.nan], [4, np.nan, np.nan], [10, 5, np.nan]]
imp_mean.transform(X)
array([[ 7. , 2. ],
[ 4. , -0.999998],
[10. , 5. ]])
so if all is Nan
the IterativeImputer has no `idea how to fit and transform it.
The original example in the docs end ups with (3,3) shape.
Upvotes: 1