Is there an easy way to label encode then one hot encode with column transformer?

Question

I'm usng column transformer from sklearn https://scikit-learn.org/stable/modules/generated/sklearn.compose.ColumnTransformer.html

And i think it's a great tool to manage different scalers. Makes life a lot easier. However, i'm not sure if there's a way to do categorical values, since i need to first use a label encoder and then one hot encode the integers. Is there anything that does both of these for me? I'd prefer to have all my scaling done in one column transformer. It's much easier to manage.

Chris · Accepted Answer

sklearn.preprocessing.OneHotEncoder can take care of categorical values without preprocessing them into ints:

from sklearn.preprocessing import OneHotEncoder

df = pd.DataFrame({'col1': ['a', 'b', 'c']})
ohe = OneHotEncoder(sparse=False)
ohe.fit_transform(df['col1'].values.reshape(-1, 1))

Output:

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

Is there an easy way to label encode then one hot encode with column transformer?

Answers (2)

Related Questions