John
John

Reputation: 1465

Pandas: Getting a rolling sum while grouping by a column

I have a pandas dataframe that looks like

Name    Date               Value
Sarah   11-01-2015         3
Sarah   11-02-2015         2
Sarah   11-03-2015         27
Bill    11-01-2015         42
Bill    11-02-2015         5
Bill    11-03-2015         15
.... (a couple hundred rows)

How do I get a 30 day (or x day) rolling sum of these values broken out by whoever is in the 'Name' column? The ideal output would have the same columns as the current dataframe, but instead of having the values for each row be what that person had as a value for that day, it would be the cumulative sum of what their values over the past 30 days.

I know I can do

result = pd.rolling_sum(df, 30)

to get the rolling sum overall. But how do I return a dataframe with that rolling sum grouped by the 'Name' column?

Upvotes: 1

Views: 3666

Answers (2)

PSAfrance
PSAfrance

Reputation: 11

Note that if you don't need a precise temporal window, or if your dataset has 1 line per [day , user] (which seems to be your case), then the standard groupby of pandas is perfectly suited. See this very similar question

Otherwise, something like:

df.groupby('Name').rolling('30D', on="Date").Value.sum()

should work.

Upvotes: 0

John
John

Reputation: 1465

Figured it out using the grigri group_resample function.

df = group_resample(df,date_column='Date',groupby=group_by,value_column='Value',how='sum',freq='d')
df = df.unstack(group_by).fillna(0)
result = pd.rolling_mean(df,30)

Upvotes: 1

Related Questions