Reputation: 21
I have the following dataframe loaded up in Pandas.
print(pandaDf)
id col1 col2 col3
12a a b d
22b d a b
33c c a b
I am trying to convert the values across multiple rows into its columns so the output would be like this :
Desired output:
id a b c d
12a 1 1 0 1
22b 1 1 0 0
33c 1 1 1 0
I have tried adding in a value column where the value = 1 and using a pivot table
pandaDf['value'] = 1
column = ['col1', 'col2', 'col3']
pandaDf.pivot_table(index = 'id', value = 'value', columns = column)
However, the resulting data frame is a multilevel index and the pandaDf.pivot() method does not allow multiple column values.
Please advise about how I could do this with an output of a single level index.
Thanks for taking the time to read this and I apologize if I have made any formatting errors in posting the question. I am still learning the proper stackoverflow syntax.
Upvotes: 2
Views: 73
Reputation: 153510
You can use One-Hot Encoding to solve this problem.
Here is one way to do this pd.get_dummies
and some multiindex flatten and sum
:
df1 = df.set_index('id')
df_out = pd.get_dummies(df1)
df_out.columns = df_out.columns.str.split('_', expand=True)
df_out = df_out.sum(level=1, axis=1).reset_index()
print(df_out)
Output:
id a c d b
0 12a 1 0 1 1
1 22b 1 0 1 1
2 33c 1 1 0 1
Upvotes: 3
Reputation: 323326
Using get_dummies
pd.get_dummies(df.set_index('id'),prefix='', prefix_sep='').sum(level=0,axis=1)
Out[81]:
a c d b
id
12a 1 0 1 1
22b 1 0 1 1
33c 1 1 0 1
Upvotes: 2