Sums of Column A values for contigious same-value boolean values of separate column

Question

I have a dataframe:

df = pd.DataFrame(np.random.rand(15, 3), columns=list('ACD'))
df['C > D'] = df['C'] > df['D']

    A            C          D           C > D
0   0.031469    0.104515    0.123596    False
1   0.549081    0.065270    0.036311    True
2   0.426498    0.674991    0.601090    True
3   0.759211    0.680903    0.601398    True
4   0.459308    0.801639    0.572331    True
5   0.691453    0.559478    0.959135    False
6   0.181677    0.091478    0.192358    False
7   0.186661    0.981368    0.721595    True
8   0.473044    0.603869    0.683941    False
9   0.015301    0.173707    0.304635    False
10  0.645700    0.300221    0.944034    False
11  0.087918    0.020047    0.720342    False
12  0.012420    0.017378    0.050286    False
13  0.496994    0.631002    0.618231    True
14  0.133083    0.454531    0.451067    True

What I am attempting to do:

I am trying to create a new column, that will take the sum of Column A in relation to contiguous same-valued groups of column C>D.

So the first value of C>D is False and there aren't any False values afterwards, so this contiguous same-valued group consists of one item, and the return values are the sum of all elements from Column A that fall into this group (index 0), 0.031469

The next group due to change of value (from False to True) consist of indices 1-4 (inclusive), which hold all True values. So the values for this group would be the sum of:

1   0.549081    
2   0.426498    
3   0.759211    
4   0.459308

which is something like 2.201 (at the top of my head).

Quang Hoang · Accepted Answer

The contiguous same values can be identified with cumsum() on non-zero differences. So you can do:

# print groups to see details
groups = df['C > D'].diff().ne(0).cumsum()

# groupby
df.groupby(gropus)['A'].sum()

Output:

C > D
1    0.031469
2    2.194098
3    0.873130
4    0.186661
5    1.234383
6    0.630077
Name: A, dtype: float64

Sums of Column A values for contigious same-value boolean values of separate column

What I am attempting to do:

Answers (1)

Related Questions