Python pandas unique de-duped sum pivot table

Question

Working with dataframe df:

User_ID | Transaction_ID | Transaction_Row | Category
3824739         123               -1           A
3824739         123               -1           A
2398473         345               0            A
1230984         567               1            C

I need to pivot the above data by Category and sum Transaction_Row. However, I need to groupby Transaction ID, so that for Transaction ID 123 above, I only count the -1 once.

Can I do this with a pandas pivot table or only with a groupby?

pd.pivot_table(df,index=["Category"],values=["Transaction_Row"],aggfunc=np.sum)

Current Output:

Category | Sum of Transaction_Row
   A               -2
   C                1

Desired Output:

Category | Sum of Transaction_Row
   A               -1
   C                1

I don't know, how to edit the statement above to fix the double counting issue.

Thank You!

D_Serg · Accepted Answer

I hope I got your question right. First, drop duplicates based on Transaction_ID and Transaction_Row only. Then do the pivot.

df_2 = df.drop_duplicates(subset=['Transaction_ID', 'Transaction_Row']) 
pd.pivot_table(df_2, index=["Category"], values=["Transaction_Row"], aggfunc=np.sum)

Python pandas unique de-duped sum pivot table

Answers (1)

Related Questions