Reputation: 1939
I have a pandas DataFrame of the form:
import pandas as pd
df = pd.DataFrame({
'a': [1,2,3,4,5,6],
'b': [0,1,0,1,0,1]
})
I want to group the data by the value of 'b' and add new column 'c' which contains a rolling sum of 'a' for each group, then I want to recombine all the groups back into an ungrouped DataFrame which contains the 'c' column. I have got as far as:
for i, group in df.groupby('b'):
group['c'] = group.a.rolling(
window=2,
min_periods=1,
center=False
).sum()
But there are several problems with this approach:
Operating on each group using a for loop feels like it is going to be slow for a large DataFrame (like my actual data)
I can't find an elegant way to save column 'c' for each group and add it back to the original DataFrame. I could append c for each group to an array, zip it with an analagous index array, etc. but that seems very hacky. Is there a built-in pandas method that I am missing here?
Upvotes: 1
Views: 925
Reputation: 29721
If using groupby
is a must then, you could go with groupby.apply
to compute all in one go instead:
df['c'] = df.groupby('b')['a'].apply(lambda x: x.rolling(2, min_periods=1).sum())
Starting with v0.19.1
, you can directly call rolling()/expanding()
methods on groupby objects as shown:
df['c'] = df.groupby('b').rolling(2, min_periods=1)['a'].sum().sortlevel(1).values
Both giving you :-
df
Upvotes: 1