john
john

Reputation: 35390

How do I calculate the mean of all subsequent rows in pandas efficiently?

Given a dataframe that looks something like this:

date,score
2019-10-01,5
2019-10-02,4
2019-10-03,3
2019-10-04,6

How do I go about calculating the mean of score using subsequent/following rows, such that it looks/behaves like this:

date,score
2019-10-01,5,(5+4+3+6)/4
2019-10-02,4,(4+3+6)/3
2019-10-03,3,(3+6)/2
2019-10-04,6,6

This is super easy in SQL which is where I am trying to translate this from, where in SQL I can write:

select avg(score) over(order by date) ...

But I'm having trouble trying to figure this out in pandas.

Any guidance would be greatly appreciated.

Thank you!

Upvotes: 1

Views: 164

Answers (3)

Quang Hoang
Quang Hoang

Reputation: 150755

cumsum on reverse series:

df['cum_mean'] = (df[::-1].assign(c=1)
                    .agg({'score':'cumsum', 'c':'cumsum'})
                    .assign(cum_mean = lambda x: x['score']/x['c'])
                    ['cum_mean']
                 )

Output:

         date  score  cum_mean
0  2019-10-01      5  4.500000
1  2019-10-02      4  4.333333
2  2019-10-03      3  4.500000
3  2019-10-04      6  6.000000

Upvotes: 0

Andy L.
Andy L.

Reputation: 25249

Try expanding on the reversed series

df['calc_mean'] = df.score[::-1].expanding(1).mean()

Out[228]:
         date  score  calc_mean
0  2019-10-01      5   4.500000
1  2019-10-02      4   4.333333
2  2019-10-03      3   4.500000
3  2019-10-04      6   6.000000

Upvotes: 1

rafaelc
rafaelc

Reputation: 59274

Use cumsum bottom-up and divide by the arange

df['cummean'] = df.score[::-1].cumsum()[::-1] / np.arange(len(df), 0, -1)

         date  score   cummean
0  2019-10-01      5  4.500000
1  2019-10-02      4  4.333333
2  2019-10-03      3  4.500000
3  2019-10-04      6  6.000000

Upvotes: 1

Related Questions