Reputation: 49
I have a dataframe (df
) that is like the one below:
month-year name a b c
2018-01 X 2 1 4
2018-01 Y 1 0 5
2018-01 X 1 6 3
2018-01 Y 4 10 7
2018-02 X 13 4 2
2018-02 Y 22 13 9
2018-02 X 3 7 4
2018-02 Y 2 15 0
I want to groupby
month-year
and name
to get the sum of column a
, average of column b
, and std of column c
. However, I want the sum, average, and std to be a rolling/cumulative number.
For example, for this dataset, to find the output I want for a, I can do something like
df.groupby(['month_year','name']).agg(sum).groupby(level=[1]).agg({'a':np.cumsum})
to get something like
month-year name a
2018-01 X 3
Y 5
2018-02 X 19
Y 29
What can I do to find the cumulative average of b
and std of c
to get an output that looks like this?
month-year name a b c
2018-01 X 3 3.5 0.71
Y 5 5 1.41
2018-02 X 19 4.5 0.96
Y 29 9.5 3.86
Thank you.
Upvotes: 3
Views: 2901
Reputation: 59519
You can do this with expanding
The first step is to calculate the expanding sum, mean and std for each of your columns, grouping only by 'name'
and to join that back to the original DataFrame
.
Then you want to groupby and select the last row within each ['month-year', 'name']
group.
df = df.join(df.groupby(['name']).expanding().agg({'a': sum, 'b': 'mean', 'c': 'std'})
.reset_index(level=0, drop=True)
.add_suffix('_roll'))
df.groupby(['month-year', 'name']).last().drop(columns=['a', 'b', 'c'])
Output:
a_roll b_roll c_roll
month-year name
2018-01 X 3.0 3.5 0.707107
Y 5.0 5.0 1.414214
2018-02 X 19.0 4.5 0.957427
Y 29.0 9.5 3.862210
Upvotes: 1