Reputation: 5378
I have a list of numbers in Python, like this:
x = [12, 34, 29, 38, 34, 51, 29, 34, 47, 34, 55, 94, 68, 81]
What's the best way to find the trend in these numbers? I'm not interested in predicting what the next number will be, I just want to output the trend for many sets of numbers so that I can compare the trends.
Edit: By trend, I mean that I'd like a numerical representation of whether the numbers are increasing or decreasing and at what rate. I'm not massively mathematical, so there's probably a proper name for this!
Edit 2: It looks like what I really want is the co-efficient of the linear best fit. What's the best way to get this in Python?
Upvotes: 39
Views: 79770
Reputation: 406
You can use simply scipy library
from scipy.stats import linregress
data = [12, 34, 29, 38, 34, 51, 29, 34, 47, 34, 55, 94, 68, 81]
x = np.arange(1,len(data)+1)
y=np.array(data)
res = linregress(x, y)
print(f'Equation: {res[0]:.3f} * t + {res[1]:.3f}, R^2: {res[2] ** 2:.2f} ')
res
Output:
Equation: 4.325 * t + 13.275, R^2: 0.66
LinregressResult(slope=4.325274725274725, intercept=13.274725274725277, rvalue=0.8096297800892154, pvalue=0.0004497809466484867, stderr=0.9051717124425395, intercept_stderr=7.707259409345618)
Upvotes: 1
Reputation: 86128
You can find the OLS coefficient using numpy:
import numpy as np
y = [12, 34, 29, 38, 34, 51, 29, 34, 47, 34, 55, 94, 68, 81]
x = []
x.append(range(len(y))) #Time variable
x.append([1 for ele in xrange(len(y))]) #This adds the intercept, use range in Python3
y = np.matrix(y).T
x = np.matrix(x).T
betas = ((x.T*x).I*x.T*y)
Results:
>>> betas
matrix([[ 4.32527473], #coefficient on the time variable
[ 17.6 ]]) #coefficient on the intercept
Since the coefficient on the trend variable is positive, observations in your variable are increasing over time.
Upvotes: 2
Reputation: 10825
Possibly you mean you want to plot these numbers on a graph and find a straight line through them where the overall distance between the line and the numbers is minimized? This is called a linear regression
def linreg(X, Y):
"""
return a,b in solution to y = ax + b such that root mean square distance between trend line and original points is minimized
"""
N = len(X)
Sx = Sy = Sxx = Syy = Sxy = 0.0
for x, y in zip(X, Y):
Sx = Sx + x
Sy = Sy + y
Sxx = Sxx + x*x
Syy = Syy + y*y
Sxy = Sxy + x*y
det = Sxx * N - Sx * Sx
return (Sxy * N - Sy * Sx)/det, (Sxx * Sy - Sx * Sxy)/det
x = [12, 34, 29, 38, 34, 51, 29, 34, 47, 34, 55, 94, 68, 81]
a,b = linreg(range(len(x)),x) //your x,y are switched from standard notation
The trend line is unlikely to pass through your original points, but it will be as close as possible to the original points that a straight line can get. Using the gradient and intercept values of this trend line (a,b) you will be able to extrapolate the line past the end of the array:
extrapolatedtrendline=[a*index + b for index in range(20)] //replace 20 with desired trend length
Upvotes: 33
Reputation: 2539
Compute the beta coefficient.
y = [12, 34, 29, 38, 34, 51, 29, 34, 47, 34, 55, 94, 68, 81]
x = range(1,len(y)+1)
def var(X):
S = 0.0
SS = 0.0
for x in X:
S += x
SS += x*x
xbar = S/float(len(X))
return (SS - len(X) * xbar * xbar) / (len(X) -1.0)
def cov(X,Y):
n = len(X)
xbar = sum(X) / n
ybar = sum(Y) / n
return sum([(x-xbar)*(y-ybar) for x,y in zip(X,Y)])/(n-1)
def beta(x,y):
return cov(x,y)/var(x)
print beta(x,y) #4.34285714286
Upvotes: -2
Reputation: 63707
The Link provided by Keith or probably the answer from Riaz might help you to get the poly fit, but it is always recommended to use libraries if available, and for the problem in your hand, numpy provides a wonderful polynomial fit function called polyfit . You can use polyfit to fit the data over any degree of equation.
Here is an example using numpy to fit the data in a linear equation of the form y=ax+b
>>> data = [12, 34, 29, 38, 34, 51, 29, 34, 47, 34, 55, 94, 68, 81]
>>> x = np.arange(0,len(data))
>>> y=np.array(data)
>>> z = np.polyfit(x,y,1)
>>> print "{0}x + {1}".format(*z)
4.32527472527x + 17.6
>>>
similarly a quadratic fit would be
>>> print "{0}x^2 + {1}x + {2}".format(*z)
0.311126373626x^2 + 0.280631868132x + 25.6892857143
>>>
Upvotes: 28
Reputation: 23265
You could do a least squares fit of the data.
Using the formula from this page:
y = [12, 34, 29, 38, 34, 51, 29, 34, 47, 34, 55, 94, 68, 81]
N = len(y)
x = range(N)
B = (sum(x[i] * y[i] for i in xrange(N)) - 1./N*sum(x)*sum(y)) / (sum(x[i]**2 for i in xrange(N)) - 1./N*sum(x)**2)
A = 1.*sum(y)/N - B * 1.*sum(x)/N
print "%f + %f * x" % (A, B)
Which prints the starting value and delta of the best fit line.
Upvotes: 6
Reputation: 27282
I agree with Keith, I think you're probably looking for a linear least squares fit (if all you want to know is if the numbers are generally increasing or decreasing, and at what rate). The slope of the fit will tell you at what rate they're increasing. If you want a visual representation of a linear least squares fit, try Wolfram Alpha:
Update: If you want to implement a linear regression in Python, I recommend starting with the explanation at Mathworld:
http://mathworld.wolfram.com/LeastSquaresFitting.html
It's a very straightforward explanation of the algorithm, and it practically writes itself. In particular, you want to pay close attention to equations 16-21, 27, and 28.
Try writing the algorithm yourself, and if you have problems, you should open another question.
Upvotes: 4
Reputation: 208405
Here is one way to get an increasing/decreasing trend:
>>> x = [12, 34, 29, 38, 34, 51, 29, 34, 47, 34, 55, 94, 68, 81]
>>> trend = [b - a for a, b in zip(x[::1], x[1::1])]
>>> trend
[22, -5, 9, -4, 17, -22, 5, 13, -13, 21, 39, -26, 13]
In the resulting list trend
, trend[0]
can be interpreted as the increase from x[0]
to x[1]
, trend[1]
would be the increase from x[1]
to x[2]
etc. Negative values in trend
mean that value in x
decreased from one index to the next.
Upvotes: 7