Reputation: 1934
I have a dataframe like
pd.DataFrame({'i': [ 3, 4, 12, 25, 44, 45, 52, 53, 65, 66]
, 't': range(1,11)
, 'v': range(0,100)[::10]}
)
i.e.
i t v
0 3 1 0
1 4 2 10
2 12 3 20
3 25 4 30
4 44 5 40
5 45 6 50
6 52 7 60
7 53 8 70
8 65 9 80
9 66 10 90
I would like to sum the values in column v
with the next column if i
increased by 1, otherwise do nothing.
One can assume that there are maximally two consecutive rows to sum, thus the last row might be ambiguous, depending if it is summed or not.
The resulting dataframe should look like:
i t v
0 3 1 10
2 12 3 20
3 25 4 30
4 44 5 90
6 52 7 130
8 65 9 170
Obviously I could loop over the dataframe using .iterrows()
but there must be a smarter solution.
I tried various combinations of shift
, diff
and groupby
, though I cannot see the way to do it...
Upvotes: 2
Views: 452
Reputation: 323236
Let us try
out = df.groupby(df['i'].diff().ne(1).cumsum()).agg({'i':'first','t':'first','v':'sum'})
Out[11]:
i t v
i
1 3 1 10
2 12 3 20
3 25 4 30
4 44 5 90
5 52 7 130
6 65 9 170
Upvotes: 2
Reputation: 150745
It's a common technique to identify the block with cumsum on diff:
blocks = df['i'].diff().ne(1).cumsum()
df.groupby(blocks, as_index=False).agg({'i':'first','t':'first', 'v':'sum'})
Output:
i t v
0 3 1 10
1 12 3 20
2 25 4 30
3 44 5 90
4 52 7 130
5 65 9 170
Upvotes: 2