Reputation: 3711
I want to sum the values in one column based on the values in another in Pandas. See figure below.
In one column(A) is a value of 1 or 0, and in column B another a value. Every time the value in the first column = 1 I need to add(sum) the values in the second column in between two ones.
In other words between two events (indicated by one), I need to add the total amount of other things (column b) that happened.
Upvotes: 2
Views: 1946
Reputation: 863031
Use groupby
with transform
for sum
s and numpy.where
for add empty strings or NaN
s if need numeric column:
a = df['A'].shift().cumsum().fillna(df['A'].iat[0])
df['C'] = np.where(a.duplicated(keep='last'), '', df['B'].groupby(a).transform('sum'))
print (df)
A B C
0 1 5
1 0 2
2 0 1
3 0 5
4 1 4 17
5 0 4
6 0 2
7 0 4
8 1 4 14
df['C'] = np.where(a.duplicated(keep='last'), np.nan, df['B'].groupby(a).transform('sum'))
print (df)
A B C
0 1 5 NaN
1 0 2 NaN
2 0 1 NaN
3 0 5 NaN
4 1 4 17.0
5 0 4 NaN
6 0 2 NaN
7 0 4 NaN
8 1 4 14.0
Detail:
print (a)
0 1.0
1 1.0
2 1.0
3 1.0
4 1.0
5 2.0
6 2.0
7 2.0
8 2.0
Name: A, dtype: float64
Upvotes: 3
Reputation: 1
Why not sum the columns multiplied by each other:
Iterate a for loop through your table, and when A==1
, it will add 1*B
to the sum, and when A==0
, it will add 0*B
to the sum.
Then if you need to get the sum of samples where A==0
, you iterate the product (1-A)*B
, so that the process in the sentence above is inverted.
Upvotes: 0