Reputation: 9
I have to perform one-hot encoding on the data frames that contain the survey results.
In this data frames, they represented 'multiple response' in multiple columns. for example, if someone choose '1' and '3', it is represented by '1'in first columns and '3' in second columns.
I need to organize the data like this way
Quesion1 | Quesion1 |
---|---|
1 | 3 |
2 | 4 |
1 | 5 |
-->
1 | 2 | 3 | 4 | 5 |
---|---|---|---|---|
1 | 0 | 1 | 0 | 0 |
0 | 1 | 0 | 1 | 0 |
1 | 0 | 0 | 0 | 1 |
I tried to use 'get_dummies' to do one-hot encoding. but two columns are treated as different question. ('2' in first question and '2' in second question are treated like different values)
Is there any good solution to solve this problem?
Upvotes: 0
Views: 63
Reputation: 1097
You need to use .get_dummies()
to converto to one-hot for each column and then join them using pd.concat()
:
df2 = pd.concat([pd.get_dummies(df.loc[:, column]) for column in df.columns], axis=1)
print(df2)
Output:
1 2 3 4 5
0 1 0 1 0 0
1 0 1 0 1 0
2 1 0 0 0 1
Upvotes: 0
Reputation: 262359
s = df.stack().droplevel(1)
out = pd.crosstab(s.index, s).rename_axis(index=None, columns=None)
Output:
1 2 3 4 5
0 1 0 1 0 0
1 0 1 0 1 0
2 1 0 0 0 1
Upvotes: 0
Reputation: 18315
>>> pd.get_dummies(df.stack()).groupby(level=0).sum().clip(upper=1)
1 2 3 4 5
0 1 0 1 0 0
1 0 1 0 1 0
2 1 0 0 0 1
stack
Upvotes: 1