Stefan
Stefan

Reputation: 1934

Only sum pandas rows Consecutive when column has consecutive number

I have a dataframe like

pd.DataFrame({'i': [ 3,  4, 12, 25, 44, 45, 52, 53, 65, 66]
              , 't': range(1,11) 
              , 'v': range(0,100)[::10]}
            )

i.e.

    i   t   v
0   3   1   0
1   4   2   10
2   12  3   20
3   25  4   30
4   44  5   40
5   45  6   50
6   52  7   60
7   53  8   70
8   65  9   80
9   66  10  90

I would like to sum the values in column v with the next column if i increased by 1, otherwise do nothing. One can assume that there are maximally two consecutive rows to sum, thus the last row might be ambiguous, depending if it is summed or not.

The resulting dataframe should look like:

    i   t   v
0   3   1   10
2   12  3   20
3   25  4   30
4   44  5   90
6   52  7   130
8   65  9   170

Obviously I could loop over the dataframe using .iterrows() but there must be a smarter solution. I tried various combinations of shift, diff and groupby, though I cannot see the way to do it...

Upvotes: 2

Views: 452

Answers (2)

BENY
BENY

Reputation: 323236

Let us try

out = df.groupby(df['i'].diff().ne(1).cumsum()).agg({'i':'first','t':'first','v':'sum'})
Out[11]: 
    i  t    v
i            
1   3  1   10
2  12  3   20
3  25  4   30
4  44  5   90
5  52  7  130
6  65  9  170

Upvotes: 2

Quang Hoang
Quang Hoang

Reputation: 150745

It's a common technique to identify the block with cumsum on diff:

blocks = df['i'].diff().ne(1).cumsum()
df.groupby(blocks, as_index=False).agg({'i':'first','t':'first', 'v':'sum'})

Output:

    i  t    v
0   3  1   10
1  12  3   20
2  25  4   30
3  44  5   90
4  52  7  130
5  65  9  170

Upvotes: 2

Related Questions