Reputation: 87
I'm having a dataframe in this form
col_1 group_ids value
0 Name_A 1 1
1 Name_B 2 4
2 Name_C 4 2
3 Name_D 5 10
4 Name_E 6 14
The goal is to sum all the values that the group_ids
are consecutive, which is to produce the following table
col_1 group_ids value consecutive_cumsum
0 Name_A 1 1 1
1 Name_B 2 4 5
2 Name_C 4 2 2
3 Name_D 5 10 12
4 Name_E 6 14 26
I got it to work with using itertuples()
. But what could be a more efficient way to deal with it? Many thanks!
Upvotes: 2
Views: 48
Reputation: 323306
We can try diff
with cumsum
then groupby this new key do cumsum
again
df.groupby(df.group_ids.diff().ne(1).cumsum()).value.cumsum()
0 1
1 5
2 2
3 12
4 26
Name: value, dtype: int64
Upvotes: 4
Reputation: 35155
Cumulate the previous line of 'group_ids' compared to the rest.
df['consecutive_cumsum'] = df['value'].groupby(((df['group_ids'] - df['group_ids'].shift()) != 1).cumsum()).cumsum()
col_1 group_ids value consecutive_cumsum
0 Name_A 1 1 1
1 Name_B 2 4 5
2 Name_C 4 2 2
3 Name_D 5 10 12
4 Name_E 6 14 26
Upvotes: 3