Reputation: 2615
I have the following dataframe:
Tran ID Category Quantity
0 001 A 5
1 001 B 2
2 001 C 3
3 002 A 4
4 002 C 2
5 003 D 6
I want to transform it into:
Tran ID A B C D Quantity
0 001 True True True False 10
1 002 True False True False 6
2 003 False False False True 6
I know I can use groupby
to get the sum of quantity, but I can't figure out how to perform the pivot that I described.
Upvotes: 1
Views: 61
Reputation: 71610
Or you can use:
print(df.drop('Category',1).join(df['Category'].str.get_dummies().astype(bool)).groupby('Tran ID',as_index=False).sum())
Or little easier to read:
df1 = df.drop('Category',1).join(df['Category'].str.get_dummies().astype(bool))
print(df1.groupby('Tran ID',as_index=False).sum())
Both output:
Tran ID Quantity A B C D
0 1 10 True True True False
1 2 6 True False True False
2 3 6 False False False True
pandas.DataFrame.groupby
with pandas.Series.str.get_dummies
is the way to do it.
Upvotes: 2
Reputation: 863166
Use get_dummies
for indicators with max
and add new column with aggregating sum
:
#pandas 0.23+
df1 = pd.get_dummies(df.set_index('Tran ID')['Category'], dtype=bool).max(level=0)
#oldier pandas versions
#df1 = pd.get_dummies(df.set_index('Tran ID')['Category']).astype(bool).max(level=0)
s = df.groupby('Tran ID')['Quantity'].sum()
df2 = df1.assign(Quantity = s).reset_index()
print (df2)
Tran ID A B C D Quantity
0 001 True True True False 10
1 002 True False True False 6
2 003 False False False True 6
Upvotes: 2