Reputation: 21
I'm working on the Kaggle notebook "Missing Values" from Intermediate Machine Learning course.
I am using the SimpleImputer
to preprocess a dataframe. 3 columns need to be imputed. I have applied the SimpleImputer
to each of them together in the same way.
Is there a way to use a different strategy (mean, median ...) for each column separately ?
Here's what I have for the moment:
imputer = SimpleImputer(strategy = 'median')
imputed_X_train = pd.DataFrame(imputer.fit_transform(X_train))
imputed_X_valid = pd.DataFrame(imputer.transform(X_valid))
Upvotes: 1
Views: 740
Reputation: 2372
Yes, you should use ColumnTransformer
. This will apply a median imputer to columns in median_columns_list
, and a mean imputer to columns in mean_columns_list
:
from sklearn.compose import ColumnTransformer
from sklearn.impute import SimpleImputer
ct = ColumnTransformer(
[("median_imp", SimpleImputer(strategy = 'median'), median_columns_list),
("mean_imp", SimpleImputer(strategy = 'mean'), mean_columns_list)])
Upvotes: 2