user308827
user308827

Reputation: 21961

Computing cumulative sum in pandas dataframe across specific spans

I have the foll. dataframe:

            tot_pix  Season  caap_col    lamma     kite
datetime                                                  
2000-01-01   1914.0  2000.0       1.0  1.95025  117.737362
2000-01-04   1914.0  2000.0       1.0  1.95025  117.674177
2000-01-05   1914.0  2000.0       1.0  1.95025  117.995489
2001-01-04   1914.0  2001.0       1.0  1.95025  118.114809
2001-01-05   1914.0  2001.0       1.0  1.95025  118.160295

In the dataframe above, I want to compute cumulative sum of the kite column. However, I want the cumulative sum to NOT span across Season values. E.g. the output with cumulative sum for kite should look like so:

          tot_pix Season caap_col lamma kite
datetime                    
1/1/2000    1914    2000    1   1.95025 117.737362
1/4/2000    1914    2000    1   1.95025 235.411539
1/5/2000    1914    2000    1   1.95025 235.669666
1/4/2001    1914    2001    1   1.95025 118.114809
1/5/2001    1914    2001    1   1.95025 236.275104

I can compute cumulative sum using the cumsum command, how do I restrict it to specific spans of Season?

Upvotes: 2

Views: 1286

Answers (1)

jezrael
jezrael

Reputation: 862521

You need groupby + cumsum:

#if not sorted index with column Season
#df = df.sort_index(sort_remaining=True).sort_values('Season')

df['kite'] = df.groupby('Season')['kite'].cumsum()
print (df)
           tot_pix  Season  caap_col    lamma        kite
datetime                                                  
2000-01-01   1914.0  2000.0       1.0  1.95025  117.737362
2000-01-04   1914.0  2000.0       1.0  1.95025  235.411539
2000-01-05   1914.0  2000.0       1.0  1.95025  353.407028
2001-01-04   1914.0  2001.0       1.0  1.95025  118.114809
2001-01-05   1914.0  2001.0       1.0  1.95025  236.275104

Upvotes: 3

Related Questions