최명현
최명현

Reputation: 9

python pandas one- hot encoding in several columns for the same question

I have to perform one-hot encoding on the data frames that contain the survey results.

In this data frames, they represented 'multiple response' in multiple columns. for example, if someone choose '1' and '3', it is represented by '1'in first columns and '3' in second columns.

I need to organize the data like this way

Quesion1 Quesion1
1 3
2 4
1 5

-->

1 2 3 4 5
1 0 1 0 0
0 1 0 1 0
1 0 0 0 1

I tried to use 'get_dummies' to do one-hot encoding. but two columns are treated as different question. ('2' in first question and '2' in second question are treated like different values)

Is there any good solution to solve this problem?

Upvotes: 0

Views: 63

Answers (3)

ali bakhtiari
ali bakhtiari

Reputation: 1097

You need to use .get_dummies() to converto to one-hot for each column and then join them using pd.concat():

df2 = pd.concat([pd.get_dummies(df.loc[:, column]) for column in df.columns], axis=1)
print(df2)

Output:

   1  2  3  4  5
0  1  0  1  0  0
1  0  1  0  1  0
2  1  0  0  0  1

Upvotes: 0

mozway
mozway

Reputation: 262359

Using stack and crosstab:

s = df.stack().droplevel(1)

out = pd.crosstab(s.index, s).rename_axis(index=None, columns=None)

Output:

   1  2  3  4  5
0  1  0  1  0  0
1  0  1  0  1  0
2  1  0  0  0  1

Upvotes: 0

Mustafa Aydın
Mustafa Aydın

Reputation: 18315

>>> pd.get_dummies(df.stack()).groupby(level=0).sum().clip(upper=1)

   1  2  3  4  5
0  1  0  1  0  0
1  0  1  0  1  0
2  1  0  0  0  1
  • flatten the frame with stack
  • get the dummies
  • groupby each row (now at level=0)
  • sum the 1s but max-clip at 1 for possible multiple occurences (i.e., same answers for questions)

Upvotes: 1

Related Questions