Reputation: 2655
I have a pandas dataframe of the form:
df
col_1 col_2 col_3 col_4
ID
1 A B C A
2 B D
3 A C B
df = pd.DataFrame({'col_1':['A','B','A'], 'col_2':['B','D','C'], 'col_3':['C',np.NaN,'B'], 'col_4':['A', np.NaN, np.NaN]}, index=[1,2,3])
Note that the values repeated across the columns are not accidental- they refer to the same entities (A in col_1 is the same as A in col_4, for instance). I am trying to pivot the values of this dataframe so that these unique values become the new columns. For instance, df would become:
new_df
A B C D
ID
1 2 1 1 0
2 1 0 0 1
3 1 1 1 0
The new values represent counts. I have tried pd.get_dummies() but it doesn't give me what I want. What is the most intuitive way to achieve this?
Upvotes: 3
Views: 179
Reputation: 323316
IIUC using stack
with str.get_dummies
df.stack().loc[lambda x : x!=''].str.get_dummies().sum(level=0)
A B C D
ID
1 2 1 1 0
2 0 1 0 1
3 1 1 1 0
Upvotes: 4