Reputation: 614
I'm usng column transformer from sklearn https://scikit-learn.org/stable/modules/generated/sklearn.compose.ColumnTransformer.html
And i think it's a great tool to manage different scalers. Makes life a lot easier. However, i'm not sure if there's a way to do categorical values, since i need to first use a label encoder and then one hot encode the integers. Is there anything that does both of these for me? I'd prefer to have all my scaling done in one column transformer. It's much easier to manage.
Upvotes: 0
Views: 425
Reputation: 29732
sklearn.preprocessing.OneHotEncoder
can take care of categorical values without preprocessing them into int
s:
from sklearn.preprocessing import OneHotEncoder
df = pd.DataFrame({'col1': ['a', 'b', 'c']})
ohe = OneHotEncoder(sparse=False)
ohe.fit_transform(df['col1'].values.reshape(-1, 1))
Output:
array([[1., 0., 0.],
[0., 1., 0.],
[0., 0., 1.]])
Upvotes: 1
Reputation: 14849
One option is to use pd.get_dummies(...)
from pandas
:
In [10]: df = pd.DataFrame({'a': ['A', 'B', 'C']*3})
In [11]: df
Out[11]:
a
0 A
1 B
2 C
3 A
4 B
5 C
6 A
7 B
8 C
In [12]: pd.concat([df, pd.get_dummies(df['a'], prefix='a')], axis=1)
Out[12]:
a a_A a_B a_C
0 A 1 0 0
1 B 0 1 0
2 C 0 0 1
3 A 1 0 0
4 B 0 1 0
5 C 0 0 1
6 A 1 0 0
7 B 0 1 0
8 C 0 0 1
Upvotes: 0