Reputation: 195
I am trying to get a cumulative count of a given column that excludes the current row in the dataframe.
My code is shown below. The problem with using cumsum() only is that it includes the current row in the count.
I want df['ExAnte Good Year Count'] to calculate cumsum on an ExAnte basis - ie. excluding the current row from the count.
d = {
'Year':[2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008],
'Good Year':[1, 0, 1, 0, 0, 1, 1, 1, 0],
'Year Type':['X', 'Y', 'Z', 'Z', 'Z', 'X', 'Y', 'Z', 'Z']
}
df = pd.DataFrame(d, columns=['Year','Good Year','Year Type'])
df['ExAnte Good Year Count'] = df['Good Year'].cumsum()
UPDATED QUERY: I would also like to count the cumsum of 'Good Years', grouped by Year Type. I have tried...
'df['Good Year'].groupby(['Year Type']).shift().cumsum()'
...but I get an error which says 'KeyError:'Year Type'
Upvotes: 6
Views: 3997
Reputation: 323226
df['Yourcol']=df.groupby('Year Type',sort=False)['Good Year'].apply(lambda x : x.shift().cumsum())
df
Out[283]:
Good Year Year Year Type Yourcol
0 1 2000 X NaN
1 0 2001 Y NaN
2 1 2002 Z NaN
3 0 2003 Z 1.0
4 0 2004 Z 1.0
5 1 2005 X 1.0
6 1 2006 Y 0.0
7 1 2007 Z 1.0
8 0 2008 Z 2.0
Upvotes: 3
Reputation: 41
what about this one?
df['ExAnte Good Year Count'] = df['Good Year'].shift().cumsum()
The result should be the following:
Year Good Year ExAnte Good Year Count
0 2000 1 NaN
1 2001 0 1.0
2 2002 1 1.0
3 2003 0 2.0
4 2004 0 2.0
5 2005 1 2.0
6 2006 1 3.0
7 2007 1 4.0
8 2008 0 5.0
Upvotes: 3