Reputation: 23
I am trying to impute missing values as the mean of other values in the column; however, my code is having no effect. Does anyone know what I may be doing wrong? Thanks!
My code:
from sklearn.preprocessing import Imputer
imputer = Imputer(missing_values ='NaN', strategy =
'mean', axis = 0)
imputer = imputer.fit(x[:, 1:3])
x[:, 1:3] = imputer.transform(x[:, 1:3])
print(dataset)
Output
Country Age Salary Purchased
0 France 44.0 72000.0 No
1 Spain 27.0 48000.0 Yes
2 Germany 30.0 54000.0 No
3 Spain 38.0 61000.0 No
4 Germany 40.0 NaN Yes
5 France 35.0 58000.0 Yes
6 Spain NaN 52000.0 No
7 France 48.0 79000.0 Yes
8 Germany 50.0 83000.0 No
9 France 37.0 67000.0 Yes
Upvotes: 2
Views: 2105
Reputation: 21709
You can do the following, let's say df
is your dataset:
from sklearn.preprocessing import Imputer
imputer = Imputer(missing_values ='NaN', strategy = 'mean', axis = 0)
df[['Age','Salary']]=imputer.fit_transform(df[['Age','Salary']])
print(df)
Country Age Salary Purchased
0 France 44.000000 72000.000000 No
1 Spain 27.000000 48000.000000 Yes
2 Germany 30.000000 54000.000000 No
3 Spain 38.000000 61000.000000 No
4 Germany 40.000000 63777.777778 Yes
5 France 35.000000 58000.000000 Yes
6 Spain 38.777778 52000.000000 No
7 France 48.000000 79000.000000 Yes
8 Germany 50.000000 83000.000000 No
9 France 37.000000 67000.000000 Yes
Upvotes: 3
Reputation: 3662
You're assigning an Imputer object to the variable imputer:
imputer = Imputer(missing_values ='NaN', strategy = 'mean', axis = 0)
You then call the fit()
function on your Imputer object, and then the transform()
function.
Then you print the dataset
variable, which I'm not sure where it comes from. Did you mean to print the Imputer object, or the result of one of those calls instead?
Upvotes: 0