Reputation: 2184
I have a categorical data (test_data) like:
s.no Product_Category_1 Product_Category_2 Product_Category_3
0 3 NaN NaN
1 1 6 14
2 12 NaN NaN
3 12 14 NaN
4 8 NaN NaN
5 1 2 NaN
I want to convert it into binary data like:
s.no 1 2 3 6 8 12 14
0 0 0 1 0 0 0 0
1 1 0 0 1 0 0 1
2 0 0 0 0 0 1 0
3 0 0 0 0 0 1 1
4 0 0 0 0 1 0 0
5 1 1 0 0 0 0 0
I could understand that I have to use one hot encoding for this. I am using python's pandas. I used the get_dummies
function, but this function is not working on whole DataFrame.
Upvotes: 1
Views: 1274
Reputation: 176968
You could set 's.no' as the index first (if it isn't already) and unstack to get the columns into a Series. You can then use get_dummies
and sum the level of the multiindex to get the result:
df = df.set_index('s.no')
pd.get_dummies(df.unstack()).sum(level=1)
which yields:
1 2 3 6 8 12 14
s.no
0 0 0 1 0 0 0 0
1 1 0 0 1 0 0 1
2 0 0 0 0 0 1 0
3 0 0 0 0 0 1 1
4 0 0 0 0 1 0 0
5 1 1 0 0 0 0 0
Upvotes: 1