13_lsicheld
13_lsicheld

Reputation: 21

Can the use of SimpleImputer be dissociated for different columns of a dataframe?

I'm working on the Kaggle notebook "Missing Values" from Intermediate Machine Learning course. I am using the SimpleImputer to preprocess a dataframe. 3 columns need to be imputed. I have applied the SimpleImputer to each of them together in the same way. Is there a way to use a different strategy (mean, median ...) for each column separately ?

Here's what I have for the moment:

imputer = SimpleImputer(strategy = 'median')
imputed_X_train = pd.DataFrame(imputer.fit_transform(X_train))
imputed_X_valid = pd.DataFrame(imputer.transform(X_valid))

Upvotes: 1

Views: 740

Answers (1)

Manu Valdés
Manu Valdés

Reputation: 2372

Yes, you should use ColumnTransformer. This will apply a median imputer to columns in median_columns_list, and a mean imputer to columns in mean_columns_list:

from sklearn.compose import ColumnTransformer
from sklearn.impute import SimpleImputer
ct = ColumnTransformer(
    [("median_imp", SimpleImputer(strategy = 'median'), median_columns_list),
     ("mean_imp", SimpleImputer(strategy = 'mean'), mean_columns_list)])

Upvotes: 2

Related Questions