Bestname
Bestname

Reputation: 303

Prevent Imputer from losing values

Currently I am trying to impute a dependent variable with pandas. (Don't ask why.) This is the dataset

y.head(15)

Out[138]: 
0     13495.0
1     16500.0
2     16500.0
3     13950.0
4     17450.0
5     15250.0
6     17710.0
7     18920.0
8     23875.0
9         NaN
10    16430.0
11    16925.0
12    20970.0
13    21105.0
14    24565.0
Name: price, dtype: float64

If i try to impute this variable something strange happens:

len(y) # 15

from sklearn.preprocessing import Imputer, 
mean_imputer_y = Imputer(strategy="mean", axis=0)
imputed_y = mean_imputer_y.fit_transform(y)

len(imputed_y) # 14

It is clearly doing the absolut opposite of what the Imputer should be doing. I don't want to delete NaN. I want to impute them.

Is there some explanation for this behaviour. What am I doing wrong?

Thanks for your help!

Upvotes: 3

Views: 435

Answers (1)

BENY
BENY

Reputation: 323236

You should using axis=1 rather than 0 .

from sklearn.preprocessing import Imputer
mean_imputer_y = Imputer(strategy="mean", axis=1,missing_values=np.nan)

mean_imputer_y.fit_transform(df.Val)


array([[13495. , 16500. , 16500. , 13950. , 17450. , 15250. , 17710. ,
        18920. , 23875. , 18117.5, 16430. , 16925. , 20970. , 21105. ,
        24565. ]])

Upvotes: 2

Related Questions