moufkir
moufkir

Reputation: 81

How to hot encode features preferably with a code example

I have a data frame like this
CODE | TYPE
0001 | A
0001 | B
0001 | C
0002 | A
0003 | B
....

and need to transform it to the following
CODE | TYPE_A | TYPE_B | TYPE_C
0001 | 1 | 1 | 1
0002 | 1 | 0 | 0
0003 | 0 | 1 | 0

Thank you in advance

Upvotes: 0

Views: 38

Answers (1)

tomp
tomp

Reputation: 1220

You can use the get_dummies function from pandas. Dummy variables are just another way of saying on-hot-encoding.

import pandas as pd
df = pd.DataFrame({'CODE': ['0001', '0001', '0001', '0002','0003'], 
                   'TYPE': ['A', 'B', 'C', 'A', 'B']})
pd.get_dummies(df, columns=['TYPE'])

The columns argument let's you specify the columns you want to one-hot-encode.

This will give:

   CODE  TYPE_A  TYPE_B  TYPE_C
0  0001       1       0       0
1  0001       0       1       0
2  0001       0       0       1
3  0002       1       0       0
4  0003       0       1       0

Upvotes: 1

Related Questions