Reputation: 4498
Working with dataframe df:
User_ID | Transaction_ID | Transaction_Row | Category
3824739 123 -1 A
3824739 123 -1 A
2398473 345 0 A
1230984 567 1 C
I need to pivot the above data by Category and sum Transaction_Row. However, I need to groupby Transaction ID, so that for Transaction ID 123 above, I only count the -1 once.
Can I do this with a pandas pivot table or only with a groupby?
pd.pivot_table(df,index=["Category"],values=["Transaction_Row"],aggfunc=np.sum)
Current Output:
Category | Sum of Transaction_Row
A -2
C 1
Desired Output:
Category | Sum of Transaction_Row
A -1
C 1
I don't know, how to edit the statement above to fix the double counting issue.
Thank You!
Upvotes: 0
Views: 203
Reputation: 494
I hope I got your question right. First, drop duplicates based on Transaction_ID and Transaction_Row only. Then do the pivot.
df_2 = df.drop_duplicates(subset=['Transaction_ID', 'Transaction_Row'])
pd.pivot_table(df_2, index=["Category"], values=["Transaction_Row"], aggfunc=np.sum)
Upvotes: 2