Reputation: 43
I'm trying to perform some calculation to retrieve the rolling total of a category between two occurrences of another category.
I'm conscious it's not easy to describe by words.
So, here is the example of the input dataframe and the expected output
Input:
Date Category Value
2012-01-04 A 10
2012-01-06 A 20
2012-02-15 B -10
2012-04-29 A 5
2012-04-30 A 70
2012-10-15 A 15
2012-10-16 B -30
2012-11-19 B -50
Expected output: Only the rows B but mentioning the rolling total of A since the previous occurrence of B
Date Category Value Total_A_since_previous_B
2012-02-15 B -10 30
2012-10-16 B -30 90
2012-11-19 B -50 0
I've tried several things without success.
Can you help me to understand how to do that?
Upvotes: 4
Views: 149
Reputation: 30920
Use GroupBy.agg
:
blocks = df.Category.shift().eq('B').cumsum()
new_df = (df.groupby(blocks)
.agg(Date= ('Date','last'),
Category = ('Category','last'),
Value = ('Value','last'),
Total_A_since_previous_B = ('Value','sum')
)
.assign(Total_A_since_previous_B = lambda x: x.Total_A_since_previous_B
.sub(x.Value))
.reset_index(drop=True))
print(new_df)
Date Category Value Total_A_since_previous_B
0 2012-02-15 B -10 30
1 2012-10-16 B -30 90
2 2012-11-19 B -50 0
Upvotes: 2
Reputation: 19947
First create groups for each occurence of B,sum the Values and then assign it as a new column to the filtered df.
(
pd.Series(np.where(df.Category.eq('B'), df.index, np.nan)).bfill()
.pipe(lambda x: df.groupby(x).Value.apply(lambda x: x[:-1].sum()))
.pipe(lambda x: df[df.Category=='B'].assign(Total_A_since_previous_B=x))
)
Date Category Value Total_A_since_previous_B
2 2012-02-15 B -10 30
6 2012-10-16 B -30 90
7 2012-11-19 B -50 0
Upvotes: 2