Pandas - Sum values in one column in between values in another

Question

I want to sum the values in one column based on the values in another in Pandas. See figure below.

In one column(A) is a value of 1 or 0, and in column B another a value. Every time the value in the first column = 1 I need to add(sum) the values in the second column in between two ones.

In other words between two events (indicated by one), I need to add the total amount of other things (column b) that happened.

jezrael · Accepted Answer

Use groupby with transform for sums and numpy.where for add empty strings or NaNs if need numeric column:

a = df['A'].shift().cumsum().fillna(df['A'].iat[0])
df['C'] = np.where(a.duplicated(keep='last'), '', df['B'].groupby(a).transform('sum')) 

print (df)
   A  B   C
0  1  5    
1  0  2    
2  0  1    
3  0  5    
4  1  4  17
5  0  4    
6  0  2    
7  0  4    
8  1  4  14

df['C'] = np.where(a.duplicated(keep='last'), np.nan, df['B'].groupby(a).transform('sum'))
print (df)
   A  B     C
0  1  5   NaN
1  0  2   NaN
2  0  1   NaN
3  0  5   NaN
4  1  4  17.0
5  0  4   NaN
6  0  2   NaN
7  0  4   NaN
8  1  4  14.0

Detail:

print (a)
0    1.0
1    1.0
2    1.0
3    1.0
4    1.0
5    2.0
6    2.0
7    2.0
8    2.0
Name: A, dtype: float64

Pandas - Sum values in one column in between values in another

Answers (2)

Related Questions