Reputation: 1560

Calculating Slopes in Numpy (or Scipy)

I am trying to find the fastest and most efficient way to calculate slopes using Numpy and Scipy. I have a data set of three Y variables and one X variable and I need to calculate their individual slopes. For example, I can easily do this one row at a time, as shown below, but I was hoping there was a more efficient way of doing this. I also don't think linregress is the best way to go because I don't need any of the auxiliary variables like intercept, standard error, etc in my results. Any help is greatly appreciated.

    import numpy as np
    from scipy import stats

    Y = [[  2.62710000e+11   3.14454000e+11   3.63609000e+11   4.03196000e+11
        4.21725000e+11   2.86698000e+11   3.32909000e+11   4.01480000e+11
        4.21215000e+11   4.81202000e+11]
        [  3.11612352e+03   3.65968334e+03   4.15442691e+03   4.52470938e+03
        4.65011423e+03   3.10707392e+03   3.54692896e+03   4.20656404e+03
        4.34233412e+03   4.88462501e+03]
        [  2.21536396e+01   2.59098311e+01   2.97401268e+01   3.04784552e+01
        3.13667639e+01   2.76377113e+01   3.27846013e+01   3.73223417e+01
        3.51249997e+01   4.42563658e+01]]
    X = [ 1990.  1991.  1992.  1993.  1994.  1995.  1996.  1997.  1998.  1999.] 
    slope_0, intercept, r_value, p_value, std_err = stats.linregress(X, Y[0,:])
    slope_1, intercept, r_value, p_value, std_err = stats.linregress(X, Y[1,:])
    slope_2, intercept, r_value, p_value, std_err = stats.linregress(X, Y[2,:])
    slope_0 = slope/Y[0,:][0]
    slope_1 = slope/Y[1,:][0]
    slope_2 = slope/Y[2,:][0]
    b, a = polyfit(X, Y[1,:], 1)
    slope_1_a = b/Y[1,:][0]

Upvotes: 39

Answers (10)

Salvador Dali

Reputation: 222511

The fastest and the most efficient way would be to use a native scipy function from linregress which calculates everything:

slope : slope of the regression line

intercept : intercept of the regression line

r-value : correlation coefficient

p-value : two-sided p-value for a hypothesis test whose null hypothesis is that the slope is zero

stderr : Standard error of the estimate

And here is an example:

a = [15, 12, 8, 8, 7, 7, 7, 6, 5, 3]
b = [10, 25, 17, 11, 13, 17, 20, 13, 9, 15]
from scipy.stats import linregress
linregress(a, b)

will return you:

LinregressResult(slope=0.20833333333333337, intercept=13.375, rvalue=0.14499815458068521, pvalue=0.68940144811669501, stderr=0.50261704627083648)

P.S. Just a mathematical formula for slope:

Upvotes: 73

user10028580

Reputation: 101

The way I did it is using the np.diff() function:

dx = np.diff(xvals)
dy = np.diff(yvals)
slopes = dy / dx

Upvotes: 10

embulldogs99

Reputation: 980

As said before, you can use scipy's linregress. Here is how to get just the slope out:

from scipy.stats import linregress
        
x = [1, 2, 3, 4, 5]
y = [2, 3, 8, 9, 22]

slope, intercept, r_value, p_value, std_err = linregress(x, y)
print(slope)

Keep in mind that doing it this way, since you are computing extra values like r_value and p_value, will take longer than calculating only the slope manually. However, Linregress is pretty quick.

Source: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.linregress.html

Upvotes: 7

Asif Raza

Reputation: 1021

Here is example of visual, how to predict the coeffiecent of linear regression. How to calculate the slope and intercept just for example for newbie. Happy learning.

Upvotes: -1

Mainland

Reputation: 4564

Well it depends on the number of points you have. If you have two points, go with linregress from stats of the scipy. If more, go with theilslope because it avoids as much as 29% outliers in the data and calculates best slope. The former simply considers all the samples, not worying about the outliers, and calculates best slope that fits all the samples.

from scipy import stats

slope1 = stats.linregress([2,4],[1,2])[0] # (ydata,xdata)

slope2 = stats.theilslopes([0.2,0.5,0.9,0.4],[1,2,3,4],0.9) # (ydata,xdata,confidence)

Upvotes: -1

kimstik

Reputation: 1889

This clear one-liner should be efficient enough without scipy:

slope = np.polyfit(X,Y,1)[0]

Finally you should get

import numpy as np

Y = np.array([
    [  2.62710000e+11, 3.14454000e+11, 3.63609000e+11, 4.03196000e+11, 4.21725000e+11, 2.86698000e+11, 3.32909000e+11, 4.01480000e+11, 4.21215000e+11, 4.81202000e+11],
    [  3.11612352e+03, 3.65968334e+03, 4.15442691e+03, 4.52470938e+03, 4.65011423e+03, 3.10707392e+03, 3.54692896e+03, 4.20656404e+03, 4.34233412e+03, 4.88462501e+03],
    [  2.21536396e+01, 2.59098311e+01, 2.97401268e+01, 3.04784552e+01, 3.13667639e+01, 2.76377113e+01, 3.27846013e+01, 3.73223417e+01, 3.51249997e+01, 4.42563658e+01]]).T
X = [ 1990,  1991,  1992,  1993,  1994,  1995,  1996,  1997,  1998,  1999] 

print np.polyfit(X,Y,1)[0]

Output is [1.54983152e+10 9.98749876e+01 1.84564349e+00]

Upvotes: 19

mxmlnkn

Reputation: 2131

I built upon the other answers and the original regression formula to build a function which works for any tensor. It will calculate the slopes of the data along the given axis. So, if you have arbitrary tensors X[i,j,k,l], Y[i,j,k,l] and you want to know the slopes for all other axes along the data in the third axis, you can call it with calcSlopes( X, Y, axis = 2 ).

import numpy as np

def calcSlopes( x = None, y = None, axis = -1 ):
    assert x is not None or y is not None

    # assume that the given single data argument are equally
    # spaced y-values (like in numpy plot command)
    if y is None:
        y = x
        x = None

    # move axis we wanna calc the slopes of to first
    # as is necessary for subtraction of the means
    # note that the axis 'vanishes' anyways, so we don't need to swap it back
    y = np.swapaxes( y, axis, 0 )
    if x is not None:
        x = np.swapaxes( x, axis, 0 )

    # https://en.wikipedia.org/wiki/Simple_linear_regression
    # beta = sum_i ( X_i - <X> ) ( Y_i - <Y> ) / ( sum_i ( X_i - <X> )^2 )
    if x is None:
        # axis with values to reduce must be trailing for broadcast_to,
        # therefore transpose
        x = np.broadcast_to( np.arange( y.shape[0] ), y.T.shape ).T
        x = x - ( x.shape[0] - 1 ) / 2. # mean of (0,1,...,n-1) is n*(n-1)/2/n
    else:
        x = x - np.mean( x, axis = 0 )
    y = y - np.mean( y, axis = 0 )

    # beta = sum_i x_i y_i / sum_i x_i*^2
    slopes = np.sum( np.multiply( x, y ), axis = 0 ) / np.sum( x**2, axis = 0 )

    return slopes

It also has the gimmick to work with only equally spaced y data being given. So for example:

y = np.array( [
    [ 1, 2, 3, 4 ],
    [ 2, 4, 6, 8 ]
] )

print( calcSlopes( y, axis = 0 ) )
print( calcSlopes( y, axis = 1 ) )

x = np.array( [
    [ 0, 2, 4, 6 ],
    [ 0, 4, 8, 12 ]
] )

print( calcSlopes( x, y, axis = 1 ) )

Output:

[1. 2. 3. 4.]
[1. 2.]
[0.5 0.5]

Upvotes: 0

drpm

Reputation: 509

A representation that's simpler than the accepted answer:

x = np.linspace(0, 10, 11)
y = np.linspace(0, 20, 11)
y = np.c_[y, y,y]

X = x - x.mean()
Y = y - y.mean()

slope = (X.dot(Y)) / (X.dot(X))

The equation for the slope comes from Vector notation for the slope of a line using simple regression.

Upvotes: 10

Brian B

Reputation: 1450

The linear regression calculation is, in one dimension, a vector calculation. This means we can combine the multiplications on the entire Y matrix, and then vectorize the fits using the axis parameter in numpy. In your case that works out to the following

((X*Y).mean(axis=1) - X.mean()*Y.mean(axis=1)) / ((X**2).mean() - (X.mean())**2)

You're not interested in fit quality parameters but most of them can be obtained in a similar manner.

Upvotes: 19

Benjamin

Reputation: 11860

With X and Y defined the same way as in your question, you can use:

dY = (numpy.roll(Y, -1, axis=1) - Y)[:,:-1]
dX = (numpy.roll(X, -1, axis=0) - X)[:-1]

slopes = dY/dX

numpy.roll() helps you align the next observation with the current one, you just need to remove the last column which is the not useful difference between the last and first observations. Then you can calculate all slopes at once, without scipy.

In your example, dX is always 1, so you can save more time by computing slopes = dY.

Upvotes: 2

Calculating Slopes in Numpy (or Scipy)

Answers (10)

Related Questions