Kazza789
Kazza789

Reputation: 85

Pandas dataframe - how to create a conditional sum, where the condition depends on another column int the dataframe

I am trying to create a column that is a conditional cumulative sum in Pandas. I know how to do this easily in Excel but am really struggling to work out how I can achieve this in Pandas. Consider the following dataframe

df:

    Year    Income
0   2001    17
1   2001    4
1   2001    9
2   2002    6
2   2002    1
2   2002    1
3   2003    12
3   2003    11
3   2003    5

I would like to add a new column to the dataframe. The new column should contain the sum of all Incomes where:

  1. the value of 'Year' is the same as the value of 'Year' for the row we are calculating
  2. the value of 'Income' is less than or equal to the 'Income' value for that row that we are calculating

So the output would look like:

df:

    Year    Income  Cum_Income
0   2001    17      13          # 2 values less than 17 in 2001: 4 + 9 = 13
1   2001    4       0           # no values less than 4 in 2001: 0
1   2001    9       4           # 1 value less than 9 in 2001  : 4
2   2002    6       2           # etc.
2   2002    1       0           
2   2002    1       0           
3   2003    12      16          
3   2003    11      5           
3   2003    5       0           

The only way I can think of to achieve this is to loop through each row and calculate each cell in Cum_income separately, but I'm hopeful that there is a cleaner way than that. Any suggestions would be very helpful.

Upvotes: 1

Views: 193

Answers (1)

BENY
BENY

Reputation: 323226

Some magic from numpy broadcast

s1 = df.Year.values
s2 = df.Income.values
np.sum((s1==s1[:,None])*((s2<s2[:,None])*s2), axis=1)
Out[246]: array([13,  0,  4,  2,  0,  0, 16,  5,  0], dtype=int64)
#df['New'] = np.sum((s1==s1[:,None])*((s2<s2[:,None])*s2), axis=1)

Explain

(s1==s1[:,None]) match the year , if the year not same we do not check the value , in your case it will return False , value * False will be 0

(s2<s2[:,None])*s2 check the value of current row whether have the value is greater than it , if return True , then True multiple the row value will be the value for future to add up.

Upvotes: 1

Related Questions