Reputation:
Hello I have a dataframe:
import pandas as pd
df1 = {'name': ["x","x","x","x","x","x","x","y","y","y","y","y","y","y"],
'a': [3,4,5,11,14,15,16,2,3,4,10,13,14,15],
'b': [9,8,7,12,23,22,21,8,7,6,11,22,21,20],
'val': [2,1,3,4,5,6,3,21,11,31,41,51,61,31]
}
df1 = pd.DataFrame (df1, columns = ['name','a','b','val'])
I wish to sum the numbers in the 'val' column if the numbers in the 'a' column are next to one another. E.g. in 'a' you have 3,4,5 (all next to each other) so add together their associated numbers in the 'val' column (i.e. 2+1+3) and then create a new column where the added value is present. The harder bit for me is grouping these by 'name'.
I don't know how well I've explained this, but here is the dataframe i wish to end up with
df2 = {'name': ["x","x","x","x","x","x","x","y","y","y","y","y","y","y"],
'a': [3,4,5,11,14,15,16,2,3,4,10,13,14,15],
'b': [9,8,7,12,23,22,21,8,7,6,11,22,21,20],
'val': [2,1,3,4,5,6,3,21,11,31,41,51,61,31],
'sum_val': [6,6,6,4,14,14,14,63,63,63,41,143,143,143]
}
df2 = pd.DataFrame (df2, columns = ['name','a','b','val','sum_val'])
Upvotes: 1
Views: 173
Reputation: 863301
Create groups by compare difference for not equal with cumulative sum per groups in lambda function and pass Series
to GroupBy.transform
with sum
:
g = df1.groupby('name')['a'].apply(lambda x: x.diff().ne(1).cumsum())
df1['sum_val'] = df1.groupby([g, 'name'])['val'].transform('sum')
print (df1)
name a b val sum_val
0 x 3 9 2 6
1 x 4 8 1 6
2 x 5 7 3 6
3 x 11 12 4 4
4 x 14 23 5 14
5 x 15 22 6 14
6 x 16 21 3 14
7 y 2 8 21 63
8 y 3 7 11 63
9 y 4 6 31 63
10 y 10 11 41 41
11 y 13 22 51 143
12 y 14 21 61 143
13 y 15 20 31 143
Upvotes: 1