Mainland
Mainland

Reputation: 4564

Python Dataframe Find n rows rolling slope without for loop

I am trying to access n rows of the dataframe and compute mean. The objective is no to use for loop. Because, my df has 30k rows and it may slow it. So, the objective is to use a pandas function to compute n rows mean.

My code:

from scipy import stats 
dfx = pd.DataFrame({'A':[10,20,15,30,1.5,0.6,7,0.8,90,10]}) 
n=2 ## n to cover n samples 
cl_id = dfx.columns.tolist().index('A')  ### cl_id for index number of the column for using in .iloc 
l1=['NaN']*n+[stats.linregress(dfx.iloc[x+1-n:x+1,cl_id].tolist(),[1,2])[0] for x in np.arange(n,len(dfx))]
dfx['slope'] = l1
print(dfx)
      A      slope
0  10.0        NaN
1  20.0        NaN  #stats.linregress([20,10],[1,2])[0] is missing here. Why?
2  15.0       -0.2  #stats.linregress([15,20],[1,2])[0] = 0.2
3  30.0  0.0666667  #stats.linregress([30,15],[1,2])[0] = 0.06667
4   1.5 -0.0350877
5   0.6   -1.11111
6   7.0    0.15625
7   0.8   -0.16129
8  90.0  0.0112108
9  10.0    -0.0125

Everything working fine. Is there a pythonic way of doing it? Like using rolling() function etc.

Upvotes: 1

Views: 220

Answers (1)

Mohsin hasan
Mohsin hasan

Reputation: 837

n = 2
dfx.A.rolling(n).apply(lambda x: stats.linregress(x, x.index+1)[0], raw=False)

Output:

0         NaN
1    0.100000
2   -0.200000
3    0.066667
4   -0.035088
5   -1.111111
6    0.156250
7   -0.161290
8    0.011211
9   -0.012500

Upvotes: 1

Related Questions