Reputation: 1560
I am trying to find the fastest and most efficient way to calculate slopes using Numpy and Scipy. I have a data set of three Y variables and one X variable and I need to calculate their individual slopes. For example, I can easily do this one row at a time, as shown below, but I was hoping there was a more efficient way of doing this. I also don't think linregress is the best way to go because I don't need any of the auxiliary variables like intercept, standard error, etc in my results. Any help is greatly appreciated.
import numpy as np
from scipy import stats
Y = [[ 2.62710000e+11 3.14454000e+11 3.63609000e+11 4.03196000e+11
4.21725000e+11 2.86698000e+11 3.32909000e+11 4.01480000e+11
4.21215000e+11 4.81202000e+11]
[ 3.11612352e+03 3.65968334e+03 4.15442691e+03 4.52470938e+03
4.65011423e+03 3.10707392e+03 3.54692896e+03 4.20656404e+03
4.34233412e+03 4.88462501e+03]
[ 2.21536396e+01 2.59098311e+01 2.97401268e+01 3.04784552e+01
3.13667639e+01 2.76377113e+01 3.27846013e+01 3.73223417e+01
3.51249997e+01 4.42563658e+01]]
X = [ 1990. 1991. 1992. 1993. 1994. 1995. 1996. 1997. 1998. 1999.]
slope_0, intercept, r_value, p_value, std_err = stats.linregress(X, Y[0,:])
slope_1, intercept, r_value, p_value, std_err = stats.linregress(X, Y[1,:])
slope_2, intercept, r_value, p_value, std_err = stats.linregress(X, Y[2,:])
slope_0 = slope/Y[0,:][0]
slope_1 = slope/Y[1,:][0]
slope_2 = slope/Y[2,:][0]
b, a = polyfit(X, Y[1,:], 1)
slope_1_a = b/Y[1,:][0]
Upvotes: 39
Views: 197226
Reputation: 222511
The fastest and the most efficient way would be to use a native scipy function from linregress which calculates everything:
slope : slope of the regression line
intercept : intercept of the regression line
r-value : correlation coefficient
p-value : two-sided p-value for a hypothesis test whose null hypothesis is that the slope is zero
stderr : Standard error of the estimate
And here is an example:
a = [15, 12, 8, 8, 7, 7, 7, 6, 5, 3]
b = [10, 25, 17, 11, 13, 17, 20, 13, 9, 15]
from scipy.stats import linregress
linregress(a, b)
will return you:
LinregressResult(slope=0.20833333333333337, intercept=13.375, rvalue=0.14499815458068521, pvalue=0.68940144811669501, stderr=0.50261704627083648)
P.S. Just a mathematical formula for slope:
Upvotes: 73
Reputation: 101
The way I did it is using the np.diff()
function:
dx = np.diff(xvals)
dy = np.diff(yvals)
slopes = dy / dx
Upvotes: 10
Reputation: 980
As said before, you can use scipy's linregress. Here is how to get just the slope out:
from scipy.stats import linregress
x = [1, 2, 3, 4, 5]
y = [2, 3, 8, 9, 22]
slope, intercept, r_value, p_value, std_err = linregress(x, y)
print(slope)
Keep in mind that doing it this way, since you are computing extra values like r_value
and p_value
, will take longer than calculating only the slope manually. However, Linregress is pretty quick.
Source: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.linregress.html
Upvotes: 7
Reputation: 1021
Here is example of visual, how to predict the coeffiecent of linear regression. How to calculate the slope and intercept just for example for newbie. Happy learning.
Upvotes: -1
Reputation: 4564
Well it depends on the number of points you have. If you have two points, go with linregress
from stats
of the scipy
. If more, go with theilslope
because it avoids as much as 29% outliers in the data and calculates best slope. The former simply considers all the samples, not worying about the outliers, and calculates best slope that fits all the samples.
from scipy import stats
slope1 = stats.linregress([2,4],[1,2])[0] # (ydata,xdata)
slope2 = stats.theilslopes([0.2,0.5,0.9,0.4],[1,2,3,4],0.9) # (ydata,xdata,confidence)
Upvotes: -1
Reputation: 1889
This clear one-liner should be efficient enough without scipy:
slope = np.polyfit(X,Y,1)[0]
Finally you should get
import numpy as np
Y = np.array([
[ 2.62710000e+11, 3.14454000e+11, 3.63609000e+11, 4.03196000e+11, 4.21725000e+11, 2.86698000e+11, 3.32909000e+11, 4.01480000e+11, 4.21215000e+11, 4.81202000e+11],
[ 3.11612352e+03, 3.65968334e+03, 4.15442691e+03, 4.52470938e+03, 4.65011423e+03, 3.10707392e+03, 3.54692896e+03, 4.20656404e+03, 4.34233412e+03, 4.88462501e+03],
[ 2.21536396e+01, 2.59098311e+01, 2.97401268e+01, 3.04784552e+01, 3.13667639e+01, 2.76377113e+01, 3.27846013e+01, 3.73223417e+01, 3.51249997e+01, 4.42563658e+01]]).T
X = [ 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999]
print np.polyfit(X,Y,1)[0]
Output is [1.54983152e+10 9.98749876e+01 1.84564349e+00]
Upvotes: 19
Reputation: 2131
I built upon the other answers and the original regression formula to build a function which works for any tensor.
It will calculate the slopes of the data along the given axis. So, if you have arbitrary tensors X[i,j,k,l], Y[i,j,k,l]
and you want to know the slopes for all other axes along the data in the third axis, you can call it with calcSlopes( X, Y, axis = 2 )
.
import numpy as np
def calcSlopes( x = None, y = None, axis = -1 ):
assert x is not None or y is not None
# assume that the given single data argument are equally
# spaced y-values (like in numpy plot command)
if y is None:
y = x
x = None
# move axis we wanna calc the slopes of to first
# as is necessary for subtraction of the means
# note that the axis 'vanishes' anyways, so we don't need to swap it back
y = np.swapaxes( y, axis, 0 )
if x is not None:
x = np.swapaxes( x, axis, 0 )
# https://en.wikipedia.org/wiki/Simple_linear_regression
# beta = sum_i ( X_i - <X> ) ( Y_i - <Y> ) / ( sum_i ( X_i - <X> )^2 )
if x is None:
# axis with values to reduce must be trailing for broadcast_to,
# therefore transpose
x = np.broadcast_to( np.arange( y.shape[0] ), y.T.shape ).T
x = x - ( x.shape[0] - 1 ) / 2. # mean of (0,1,...,n-1) is n*(n-1)/2/n
else:
x = x - np.mean( x, axis = 0 )
y = y - np.mean( y, axis = 0 )
# beta = sum_i x_i y_i / sum_i x_i*^2
slopes = np.sum( np.multiply( x, y ), axis = 0 ) / np.sum( x**2, axis = 0 )
return slopes
It also has the gimmick to work with only equally spaced y data being given. So for example:
y = np.array( [
[ 1, 2, 3, 4 ],
[ 2, 4, 6, 8 ]
] )
print( calcSlopes( y, axis = 0 ) )
print( calcSlopes( y, axis = 1 ) )
x = np.array( [
[ 0, 2, 4, 6 ],
[ 0, 4, 8, 12 ]
] )
print( calcSlopes( x, y, axis = 1 ) )
Output:
[1. 2. 3. 4.]
[1. 2.]
[0.5 0.5]
Upvotes: 0
Reputation: 509
A representation that's simpler than the accepted answer:
x = np.linspace(0, 10, 11)
y = np.linspace(0, 20, 11)
y = np.c_[y, y,y]
X = x - x.mean()
Y = y - y.mean()
slope = (X.dot(Y)) / (X.dot(X))
The equation for the slope comes from Vector notation for the slope of a line using simple regression.
Upvotes: 10
Reputation: 1450
The linear regression calculation is, in one dimension, a vector calculation. This means we can combine the multiplications on the entire Y matrix, and then vectorize the fits using the axis parameter in numpy. In your case that works out to the following
((X*Y).mean(axis=1) - X.mean()*Y.mean(axis=1)) / ((X**2).mean() - (X.mean())**2)
You're not interested in fit quality parameters but most of them can be obtained in a similar manner.
Upvotes: 19
Reputation: 11860
With X and Y defined the same way as in your question, you can use:
dY = (numpy.roll(Y, -1, axis=1) - Y)[:,:-1]
dX = (numpy.roll(X, -1, axis=0) - X)[:-1]
slopes = dY/dX
numpy.roll() helps you align the next observation with the current one, you just need to remove the last column which is the not useful difference between the last and first observations. Then you can calculate all slopes at once, without scipy.
In your example, dX
is always 1, so you can save more time by computing slopes = dY
.
Upvotes: 2