saurabh agarwal
saurabh agarwal

Reputation: 2184

Convert categorical data (in multiple columns) to binary data

I have a categorical data (test_data) like:

s.no    Product_Category_1  Product_Category_2  Product_Category_3
0            3                   NaN                 NaN
1            1                    6                  14
2            12                  NaN                 NaN
3            12                  14                  NaN
4            8                   NaN                 NaN
5            1                    2                  NaN

I want to convert it into binary data like:

s.no    1   2   3   6    8  12   14
0       0   0   1   0    0   0   0
1       1   0   0   1    0   0   1
2       0   0   0   0    0   1   0
3       0   0   0   0    0   1   1
4       0   0   0   0    1   0   0
5       1   1   0   0    0   0   0

I could understand that I have to use one hot encoding for this. I am using python's pandas. I used the get_dummies function, but this function is not working on whole DataFrame.

Upvotes: 1

Views: 1274

Answers (1)

Alex Riley
Alex Riley

Reputation: 176968

You could set 's.no' as the index first (if it isn't already) and unstack to get the columns into a Series. You can then use get_dummies and sum the level of the multiindex to get the result:

df = df.set_index('s.no') 
pd.get_dummies(df.unstack()).sum(level=1)

which yields:

      1   2   3   6   8   12  14
s.no                            
0      0   0   1   0   0   0   0
1      1   0   0   1   0   0   1
2      0   0   0   0   0   1   0
3      0   0   0   0   0   1   1
4      0   0   0   0   1   0   0
5      1   1   0   0   0   0   0

Upvotes: 1

Related Questions