Only sum pandas rows Consecutive when column has consecutive number

Question

I have a dataframe like

pd.DataFrame({'i': [ 3,  4, 12, 25, 44, 45, 52, 53, 65, 66]
              , 't': range(1,11) 
              , 'v': range(0,100)[::10]}
            )

i.e.

    i   t   v
0   3   1   0
1   4   2   10
2   12  3   20
3   25  4   30
4   44  5   40
5   45  6   50
6   52  7   60
7   53  8   70
8   65  9   80
9   66  10  90

I would like to sum the values in column v with the next column if i increased by 1, otherwise do nothing. One can assume that there are maximally two consecutive rows to sum, thus the last row might be ambiguous, depending if it is summed or not.

The resulting dataframe should look like:

    i   t   v
0   3   1   10
2   12  3   20
3   25  4   30
4   44  5   90
6   52  7   130
8   65  9   170

Obviously I could loop over the dataframe using .iterrows() but there must be a smarter solution. I tried various combinations of shift, diff and groupby, though I cannot see the way to do it...

Quang Hoang · Accepted Answer

It's a common technique to identify the block with cumsum on diff:

blocks = df['i'].diff().ne(1).cumsum()
df.groupby(blocks, as_index=False).agg({'i':'first','t':'first', 'v':'sum'})

Output:

    i  t    v
0   3  1   10
1  12  3   20
2  25  4   30
3  44  5   90
4  52  7  130
5  65  9  170

Only sum pandas rows Consecutive when column has consecutive number

Answers (2)

Related Questions