Reputation: 115
So below I have an example of a DataFrame where throughout, there will be multiple instances like the SALES TAX EXPENSE lines, where if there is a duplicate, it will need to be converted into one line where the total of Trans_Amt should be Trans_Type C - Trans_Type D.
So for example in this DF, there should only be one line for SALES TAX EXPENSE, and the total should be -36239.65.
This happens multiple times throughout the DF, with multiple different ActNames. I'm looking for insight as to the most efficient way to do this transformation and apply it to any instance where this occurs.
Thank you!
ActName ActCode Trans_Type Trans_Amt
0 SALES 401 C 2082748.85
1 SALES TAX EXPENSE 407 C 100000.00
30 DISCOUNTS 405 D -654.59
31 SALES TAX EXPENSE 407 D 136239.65
Upvotes: 1
Views: 41
Reputation: 38415
Group data by columns and take assign difference to Amt. Then drop duplicates.
df['Trans_Amt'] = df.groupby(['ActName','ActCode']).Trans_Amt.apply(lambda x: x.diff(periods=-1)).combine_first(df['Trans_Amt'])
df.drop_duplicates('ActName')
ActName ActCode Trans_Type Trans_Amt
0 SALES 401 C 2082748.85
1 SALES TAX EXPENSE 407 C -36239.65
30 DISCOUNTS 405 D -654.59
Edit: Based on follow-up question. If the difference should be with the previous row, try
df['Trans_Amt'] = df.groupby(['ActName','ActCode']).Trans_Amt.apply(lambda x: x.diff()).combine_first(df['Trans_Amt'])
df.drop_duplicates('ActName', keep='last')
ActName ActCode Trans_Type Trans_Amt
0 SALES 401 C 2082748.85
30 DISCOUNTS 405 D -654.59
31 SALES TAX EXPENSE 407 D 36239.65
Upvotes: 2