Reputation: 2810
I have a dataframe with Ids and a target value per each Id. my goal is to find the max no. of consecutive occurrences of 1's
in each group of the data frame.
original dataframe is as below
df = pd.DataFrame({
'Id': {0: 1,1: 1,2: 1,3: 2,4: 2,5: 2,6: 2,7: 2, 8: 2,9: 2, 10: 2, 11: 2, 12: 2},
't_year': {0: 2021, 1: 2022, 2: 2022, 3: 2021,4: 2021,5: 2021,6: 2021,7: 2021,8: 2021,9: 2021,10: 2021,11: 2022,12: 2022},
't_month_prx': {0: 10.0,1: 1.0,2: 2.0,3: 1.0,4: 2.0,5: 3.0,6: 6.0,7: 7.0,
8: 8.0,9: 9.0,10: 10.0,11: 1.0,12: 2.0},
'values': {0: 1.0, 1: 1.0, 2: 1.0,3: 1.0,4: 1.0,5: 1.0,6: 3.0,7: 1.0,8: 1.0,9: 1.0,10: 1.0,11: 1.0, 12: 1.0}
})
and my desired output is
pd.DataFrame({
'Id': {0: 1,1: 2},
'counts': {0: 3.0,1: 6.0}
})
Upvotes: 1
Views: 339
Reputation: 4229
You can first group consecutive values in the dataframe, then filter the dataframe where values contain 1
and then groupby to proceed further.
df['tmp'] = (df['values'] != df['values'].shift()).cumsum() #Group Consecutive values in Dataframe
df = df[df['values'].eq(1)].groupby(['Id', 'tmp'], as_index=False).count().groupby('Id', as_index=False)['values'].max()
OUTPUT After First Line
Id values tmp
0 1 1.0 1
1 1 1.0 1
2 1 1.0 1
3 2 1.0 1
4 2 1.0 1
5 2 1.0 1
6 2 3.0 2
7 2 1.0 3
8 2 1.0 3
9 2 1.0 3
10 2 1.0 3
11 2 1.0 3
12 2 1.0 3
Final OUTPUT
Id values
0 1 3
1 2 6
Upvotes: 0
Reputation:
You could create a mask with a unique value for each consecutive group of numbers (cumsum
+ ne
/!==
), and then groupby that and the ID, sum the numbers, and get the the max:
df.groupby([df['Id'], df['values'].ne(df.groupby('Id')['values'].shift(1)).cumsum()])['values'].sum().groupby(level=0).max().reset_index()
Output:
>>> df
Id values
0 1 3.0
1 2 6.0
Upvotes: 1