fred.schwartz
fred.schwartz

Reputation: 2155

Pandas average of last x rows

I'd like to say, if index is greater than 10, then set all values to an average of the last 3 rows before the index of 10 (9,8,7)

So far I have this;

df.loc[df.index>10,columns_list]=df.loc[df.index<10 & df.index>=7,columns_list].values.mean

Upvotes: 2

Views: 9706

Answers (2)

bharatk
bharatk

Reputation: 4315

Use tail() and np.mean()

Ex.

import pandas as pd
import numpy as np

df = pd.DataFrame({'A':[1,2,3,4,9,7,6,5,12,45,90,2323],'B':[9,8,7,6,5,4,3,2,1,0,12,11]})
print(df)
#mask row index which less than 10
m = df.index<10
# tail return last 3 row
mean = np.mean(df.loc[m,['A','B']].tail(3))
print(mean)

O/P:

Dataframe:

       A   B
0      1   9
1      2   8
2      3   7
3      4   6
4      9   5
5      7   4
6      6   3
7      5   2
8     12   1
9     45   0
10    90  12
11  2323  11

Mean value:

A    20.666667
B     1.000000
dtype: float64

Upvotes: 0

jezrael
jezrael

Reputation: 862781

You are close, need parentheses arounf conditions and axis=0 to numpy mean:

np.random.seed(123)
df = pd.DataFrame(np.random.randint(10, size=(15, 3)), columns=list('abc'))

cols = ['a','b']
df.loc[df.index>10, cols] = df.loc[(df.index<10) & (df.index>=7), cols].values.mean(axis=0)

print (df)
      a         b  c
0   2.0  2.000000  6
1   1.0  3.000000  9
2   6.0  1.000000  0
3   1.0  9.000000  0
4   0.0  9.000000  3
5   4.0  0.000000  0
6   4.0  1.000000  7
7   3.0  2.000000  4
8   7.0  2.000000  4
9   8.0  0.000000  7
10  9.0  3.000000  4
11  6.0  1.333333  5
12  6.0  1.333333  1
13  6.0  1.333333  5
14  6.0  1.333333  6

Upvotes: 2

Related Questions