Reputation: 195

Python Pandas Get a Cumulative Sum (cumsum) which excludes the current row

I am trying to get a cumulative count of a given column that excludes the current row in the dataframe.

My code is shown below. The problem with using cumsum() only is that it includes the current row in the count.

I want df['ExAnte Good Year Count'] to calculate cumsum on an ExAnte basis - ie. excluding the current row from the count.

d = {
      'Year':[2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008], 
      'Good Year':[1, 0, 1, 0, 0, 1, 1, 1, 0],
      'Year Type':['X', 'Y', 'Z', 'Z', 'Z', 'X', 'Y', 'Z', 'Z']
    }

df = pd.DataFrame(d, columns=['Year','Good Year','Year Type'])
df['ExAnte Good Year Count'] = df['Good Year'].cumsum()

UPDATED QUERY: I would also like to count the cumsum of 'Good Years', grouped by Year Type. I have tried...

'df['Good Year'].groupby(['Year Type']).shift().cumsum()'

...but I get an error which says 'KeyError:'Year Type'

Upvotes: 6

Answers (2)

BENY

Reputation: 323226

df['Yourcol']=df.groupby('Year Type',sort=False)['Good Year'].apply(lambda x : x.shift().cumsum())
df
Out[283]: 
   Good Year  Year Year Type  Yourcol
0          1  2000         X      NaN
1          0  2001         Y      NaN
2          1  2002         Z      NaN
3          0  2003         Z      1.0
4          0  2004         Z      1.0
5          1  2005         X      1.0
6          1  2006         Y      0.0
7          1  2007         Z      1.0
8          0  2008         Z      2.0

Upvotes: 3

Paco2074

Reputation: 41

what about this one?

df['ExAnte Good Year Count'] = df['Good Year'].shift().cumsum()

The result should be the following:

   Year  Good Year  ExAnte Good Year Count
0  2000          1                     NaN
1  2001          0                     1.0
2  2002          1                     1.0
3  2003          0                     2.0
4  2004          0                     2.0
5  2005          1                     2.0
6  2006          1                     3.0
7  2007          1                     4.0
8  2008          0                     5.0

Upvotes: 3

Python Pandas Get a Cumulative Sum (cumsum) which excludes the current row

Answers (2)

Related Questions