Reputation: 4564
I am trying to access n rows of the dataframe and compute mean. The objective is no to use for loop. Because, my df has 30k rows and it may slow it. So, the objective is to use a pandas function to compute n rows mean.
My code:
from scipy import stats
dfx = pd.DataFrame({'A':[10,20,15,30,1.5,0.6,7,0.8,90,10]})
n=2 ## n to cover n samples
cl_id = dfx.columns.tolist().index('A') ### cl_id for index number of the column for using in .iloc
l1=['NaN']*n+[stats.linregress(dfx.iloc[x+1-n:x+1,cl_id].tolist(),[1,2])[0] for x in np.arange(n,len(dfx))]
dfx['slope'] = l1
print(dfx)
A slope
0 10.0 NaN
1 20.0 NaN #stats.linregress([20,10],[1,2])[0] is missing here. Why?
2 15.0 -0.2 #stats.linregress([15,20],[1,2])[0] = 0.2
3 30.0 0.0666667 #stats.linregress([30,15],[1,2])[0] = 0.06667
4 1.5 -0.0350877
5 0.6 -1.11111
6 7.0 0.15625
7 0.8 -0.16129
8 90.0 0.0112108
9 10.0 -0.0125
Everything working fine. Is there a pythonic way of doing it? Like using rolling()
function etc.
Upvotes: 1
Views: 220
Reputation: 837
n = 2
dfx.A.rolling(n).apply(lambda x: stats.linregress(x, x.index+1)[0], raw=False)
Output:
0 NaN
1 0.100000
2 -0.200000
3 0.066667
4 -0.035088
5 -1.111111
6 0.156250
7 -0.161290
8 0.011211
9 -0.012500
Upvotes: 1