Tooblippe
Tooblippe

Reputation: 3711

Pandas - Sum values in one column in between values in another

I want to sum the values in one column based on the values in another in Pandas. See figure below.

In one column(A) is a value of 1 or 0, and in column B another a value. Every time the value in the first column = 1 I need to add(sum) the values in the second column in between two ones.

In other words between two events (indicated by one), I need to add the total amount of other things (column b) that happened.

enter image description here

Upvotes: 2

Views: 1946

Answers (2)

jezrael
jezrael

Reputation: 863031

Use groupby with transform for sums and numpy.where for add empty strings or NaNs if need numeric column:

a = df['A'].shift().cumsum().fillna(df['A'].iat[0])
df['C'] = np.where(a.duplicated(keep='last'), '', df['B'].groupby(a).transform('sum')) 

print (df)
   A  B   C
0  1  5    
1  0  2    
2  0  1    
3  0  5    
4  1  4  17
5  0  4    
6  0  2    
7  0  4    
8  1  4  14

df['C'] = np.where(a.duplicated(keep='last'), np.nan, df['B'].groupby(a).transform('sum'))
print (df)
   A  B     C
0  1  5   NaN
1  0  2   NaN
2  0  1   NaN
3  0  5   NaN
4  1  4  17.0
5  0  4   NaN
6  0  2   NaN
7  0  4   NaN
8  1  4  14.0

Detail:

print (a)
0    1.0
1    1.0
2    1.0
3    1.0
4    1.0
5    2.0
6    2.0
7    2.0
8    2.0
Name: A, dtype: float64

Upvotes: 3

Why not sum the columns multiplied by each other:

Iterate a for loop through your table, and when A==1, it will add 1*B to the sum, and when A==0, it will add 0*B to the sum.

Then if you need to get the sum of samples where A==0, you iterate the product (1-A)*B, so that the process in the sentence above is inverted.

Upvotes: 0

Related Questions