hoang_le_96
hoang_le_96

Reputation: 87

Cummulative sum between consecutive groups in a dataframe

I'm having a dataframe in this form

  col_1  group_ids value
0 Name_A     1       1
1 Name_B     2       4
2 Name_C     4       2
3 Name_D     5       10 
4 Name_E     6       14

The goal is to sum all the values that the group_ids are consecutive, which is to produce the following table

  col_1  group_ids value consecutive_cumsum
0 Name_A     1       1          1
1 Name_B     2       4          5
2 Name_C     4       2          2
3 Name_D     5       10         12
4 Name_E     6       14         26

I got it to work with using itertuples(). But what could be a more efficient way to deal with it? Many thanks!

Upvotes: 2

Views: 48

Answers (2)

BENY
BENY

Reputation: 323306

We can try diff with cumsum then groupby this new key do cumsum again

df.groupby(df.group_ids.diff().ne(1).cumsum()).value.cumsum()

0     1
1     5
2     2
3    12
4    26
Name: value, dtype: int64

Upvotes: 4

r-beginners
r-beginners

Reputation: 35155

Cumulate the previous line of 'group_ids' compared to the rest.

df['consecutive_cumsum'] = df['value'].groupby(((df['group_ids'] - df['group_ids'].shift()) != 1).cumsum()).cumsum()

    col_1   group_ids   value   consecutive_cumsum
0   Name_A  1   1   1
1   Name_B  2   4   5
2   Name_C  4   2   2
3   Name_D  5   10  12
4   Name_E  6   14  26

Upvotes: 3

Related Questions