Reputation: 614
I have a dataframe:
df = pd.DataFrame(np.random.rand(15, 3), columns=list('ACD'))
df['C > D'] = df['C'] > df['D']
A C D C > D
0 0.031469 0.104515 0.123596 False
1 0.549081 0.065270 0.036311 True
2 0.426498 0.674991 0.601090 True
3 0.759211 0.680903 0.601398 True
4 0.459308 0.801639 0.572331 True
5 0.691453 0.559478 0.959135 False
6 0.181677 0.091478 0.192358 False
7 0.186661 0.981368 0.721595 True
8 0.473044 0.603869 0.683941 False
9 0.015301 0.173707 0.304635 False
10 0.645700 0.300221 0.944034 False
11 0.087918 0.020047 0.720342 False
12 0.012420 0.017378 0.050286 False
13 0.496994 0.631002 0.618231 True
14 0.133083 0.454531 0.451067 True
I am trying to create a new column, that will
take the sum of Column A
in relation to contiguous same-valued groups of column C>D.
So the first value of C>D is False
and there aren't any False
values afterwards, so this contiguous same-valued group consists of one item, and the return values are the sum of all elements from Column A that fall into this group (index 0), 0.031469
The next group due to change of value (from False to True) consist of indices 1-4 (inclusive), which hold all True values. So the values for this group would be the sum of:
1 0.549081
2 0.426498
3 0.759211
4 0.459308
which is something like 2.201
(at the top of my head).
Upvotes: 0
Views: 39
Reputation: 150785
The contiguous same values can be identified with cumsum()
on non-zero differences. So you can do:
# print groups to see details
groups = df['C > D'].diff().ne(0).cumsum()
# groupby
df.groupby(gropus)['A'].sum()
Output:
C > D
1 0.031469
2 2.194098
3 0.873130
4 0.186661
5 1.234383
6 0.630077
Name: A, dtype: float64
Upvotes: 1