Reputation: 1663
I have the following dataframe df
:
bucket_value is_new_bucket
dates
2019-03-07 0 1
2019-03-08 1 0
2019-03-09 2 0
2019-03-10 3 0
2019-03-11 4 0
2019-03-12 5 1
2019-03-13 6 0
2019-03-14 7 1
I want to apply a specific function (let’s say the mean function) to each bucket_value
data groups where the column is_new_bucket
is equal to zero, such that the resulting dataframe would look like this:
mean_values
dates
2019-03-08 2.5
2019-03-13 6.0
In other words, applying a function to the consecutive rows where is_new_bucket = 0
, which takes the bucket_value
as input.
For instance, if I want to apply the max function, the resulting dataframe would look like this:
max_values
dates
2019-03-11 4.0
2019-03-13 6.0
Upvotes: 1
Views: 219
Reputation: 323226
Using cumsum
with filter
df.reset_index(inplace=True)
s=df.loc[df.is_new_bucket==0].groupby(df.is_new_bucket.cumsum()).agg({'date':'first','bucket_value':['mean','max']})
s
date bucket_value
first mean max
is_new_bucket
1 2019-03-08 2.5 4
2 2019-03-13 6.0 6
Updated
df.loc[df.loc[df.is_new_bucket==0].groupby(df.is_new_bucket.cumsum())['bucket_value'].idxmax()]
date bucket_value is_new_bucket
4 2019-03-11 4 0
6 2019-03-13 6 0
Updated2 after using the cumsum
create the group key Newkey , you can do whatever you need , base on the groupkey
df['Newkey']=df.is_new_bucket.cumsum()
df
date bucket_value is_new_bucket Newkey
0 2019-03-07 0 1 1
1 2019-03-08 1 0 1
2 2019-03-09 2 0 1
3 2019-03-10 3 0 1
4 2019-03-11 4 0 1
5 2019-03-12 5 1 2
6 2019-03-13 6 0 2
7 2019-03-14 7 1 3
Upvotes: 2