extarbags
extarbags

Reputation: 257

Pandas - cumsum by month?

I have a dataframe that looks like this:

Date          n
2014-02-27    4
2014-02-28    5
2014-03-01    1
2014-03-02    6
2014-03-03    7

I'm trying to get to one that looks like this

Date          n    csn
2014-02-27    4    4
2014-02-28    5    9
2014-03-01    1    1
2014-03-02    6    7
2014-03-03    7    14

...i.e. I want a column with the running total within the month and I want it to start over each month. How can I do this?

Upvotes: 7

Views: 9345

Answers (2)

Andy Hayden
Andy Hayden

Reputation: 375535

If you're doing timeseries work, I recommend using a DatetimeIndex. In this example, you can use a TimeGrouper to group by month (which groups by year-month, like in a resample):

In [11]: g = df.groupby(pd.TimeGrouper('M'))

In [12]: g['n'].cumsum()
Out[12]: 
Date
2014-02-27     4
2014-02-28     9
2014-03-01     1
2014-03-02     7
2014-03-03    14
dtype: int64

In [13]: df['csn'] = g['n'].cumsum()

Note: If you're not already using a DatetimeIndex, pass over the to_datetime function and set the index:

df['Date'] = pd.to_datetime(df['Date'])
df.set_index('Date', inplace=True)

Upvotes: 4

CT Zhu
CT Zhu

Reputation: 54340

Use .groupby(), but don't just group by month, groupby year-month instead. Or else 2013-02 will be in the same group as 2014-02, etc.

In [96]:

df['Month']=df['Date'].apply(lambda x: x[:7])
In [97]:

df['csn']=df.groupby(['Month'])['n'].cumsum()
In [98]:

print df
         Date  n    Month  csn
0  2014-02-27  4  2014-02    4
1  2014-02-28  5  2014-02    9
2  2014-03-01  1  2014-03    1
3  2014-03-02  6  2014-03    7
4  2014-03-03  7  2014-03   14

[5 rows x 4 columns]

Upvotes: 12

Related Questions