learningthemachine
learningthemachine

Reputation: 614

Is there an easy way to label encode then one hot encode with column transformer?

I'm usng column transformer from sklearn https://scikit-learn.org/stable/modules/generated/sklearn.compose.ColumnTransformer.html

And i think it's a great tool to manage different scalers. Makes life a lot easier. However, i'm not sure if there's a way to do categorical values, since i need to first use a label encoder and then one hot encode the integers. Is there anything that does both of these for me? I'd prefer to have all my scaling done in one column transformer. It's much easier to manage.

Upvotes: 0

Views: 425

Answers (2)

Chris
Chris

Reputation: 29732

sklearn.preprocessing.OneHotEncoder can take care of categorical values without preprocessing them into ints:

from sklearn.preprocessing import OneHotEncoder

df = pd.DataFrame({'col1': ['a', 'b', 'c']})
ohe = OneHotEncoder(sparse=False)
ohe.fit_transform(df['col1'].values.reshape(-1, 1))

Output:

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

Upvotes: 1

Randy
Randy

Reputation: 14849

One option is to use pd.get_dummies(...) from pandas:

In [10]: df = pd.DataFrame({'a': ['A', 'B', 'C']*3})

In [11]: df
Out[11]:
   a
0  A
1  B
2  C
3  A
4  B
5  C
6  A
7  B
8  C

In [12]: pd.concat([df, pd.get_dummies(df['a'], prefix='a')], axis=1)
Out[12]:
   a  a_A  a_B  a_C
0  A    1    0    0
1  B    0    1    0
2  C    0    0    1
3  A    1    0    0
4  B    0    1    0
5  C    0    0    1
6  A    1    0    0
7  B    0    1    0
8  C    0    0    1

Upvotes: 0

Related Questions