I have to perform one-hot encoding on the data frames that contain the survey results. In this data frames, they represented 'multiple response' in multiple columns. for example, if someone choose '1' and '3', it is represented by '1'in first columns and '3' in second columns. I need to organize the data like this way Quesion1 Quesion1 1 3 2 4 1 5 --> 1 2 3 4 5 1 0 1 0 0 0 1 0 1 0 1 0 0 0 1 I tried to use 'get_dummies' to do one-hot encoding. but two columns are treated as different question. ('2' in first question and '2' in second question are treated like different values) Is there any good solution to solve this problem?

Reputation: 9

python pandas one- hot encoding in several columns for the same question

I have to perform one-hot encoding on the data frames that contain the survey results.

In this data frames, they represented 'multiple response' in multiple columns. for example, if someone choose '1' and '3', it is represented by '1'in first columns and '3' in second columns.

I need to organize the data like this way

Quesion1	Quesion1
1	3
2	4
1	5

-->

1	2	3	4	5
1	0	1	0	0
0	1	0	1	0
1	0	0	0	1

I tried to use 'get_dummies' to do one-hot encoding. but two columns are treated as different question. ('2' in first question and '2' in second question are treated like different values)

Is there any good solution to solve this problem?

Upvotes: 0

Answers (3)

ali bakhtiari

Reputation: 1097

You need to use .get_dummies() to converto to one-hot for each column and then join them using pd.concat():

df2 = pd.concat([pd.get_dummies(df.loc[:, column]) for column in df.columns], axis=1)
print(df2)

Output:

   1  2  3  4  5
0  1  0  1  0  0
1  0  1  0  1  0
2  1  0  0  0  1

Upvotes: 0

mozway

Reputation: 262359

Using stack and crosstab:

s = df.stack().droplevel(1)

out = pd.crosstab(s.index, s).rename_axis(index=None, columns=None)

Output:

   1  2  3  4  5
0  1  0  1  0  0
1  0  1  0  1  0
2  1  0  0  0  1

Upvotes: 0

Mustafa Aydın

Reputation: 18315

>>> pd.get_dummies(df.stack()).groupby(level=0).sum().clip(upper=1)

   1  2  3  4  5
0  1  0  1  0  0
1  0  1  0  1  0
2  1  0  0  0  1

flatten the frame with stack
get the dummies
groupby each row (now at level=0)
sum the 1s but max-clip at 1 for possible multiple occurences (i.e., same answers for questions)

Upvotes: 1

python pandas one- hot encoding in several columns for the same question

Answers (3)

Related Questions