Reputation: 81
I have a data frame like this
CODE | TYPE
0001 | A
0001 | B
0001 | C
0002 | A
0003 | B
....
and need to transform it to the following
CODE | TYPE_A | TYPE_B | TYPE_C
0001 | 1 | 1 | 1
0002 | 1 | 0 | 0
0003 | 0 | 1 | 0
Thank you in advance
Upvotes: 0
Views: 38
Reputation: 1220
You can use the get_dummies function from pandas. Dummy variables are just another way of saying on-hot-encoding.
import pandas as pd
df = pd.DataFrame({'CODE': ['0001', '0001', '0001', '0002','0003'],
'TYPE': ['A', 'B', 'C', 'A', 'B']})
pd.get_dummies(df, columns=['TYPE'])
The columns
argument let's you specify the columns you want to one-hot-encode.
This will give:
CODE TYPE_A TYPE_B TYPE_C
0 0001 1 0 0
1 0001 0 1 0
2 0001 0 0 1
3 0002 1 0 0
4 0003 0 1 0
Upvotes: 1