Reputation: 11177
Let's say I have a simple data set. Perhaps in dictionary form, it would look like this:
{1:5, 2:10, 3:15, 4:20, 5:25}
(the order is always ascending).
What I want to do is logically figure out what the next point of data is most likely to be. In the case, for example, it would be {6: 30}
what would be the best way to do this?
Upvotes: 17
Views: 70801
Reputation: 984
As pointed out by this answer to a related question, as of version 0.17.0 of scipy, there is an option in scipy.interpolate.interp1d
that allows linear extrapolation. In your case, you could do:
>>> import numpy as np
>>> from scipy import interpolate
>>> x = [1, 2, 3, 4, 5]
>>> y = [5, 10, 15, 20, 25]
>>> f = interpolate.interp1d(x, y, fill_value = "extrapolate")
>>> print(f(6))
30.0
Unfortunately, as of 2024, there is a warning in the docs for interp1d
:
This class is considered legacy and will no longer receive updates. This could also mean it will be removed in future SciPy versions.
Upvotes: 10
Reputation: 14255
Here's a funny one using only numpy
, in case you do not want to depend on scipy
:
from numpy.polynomial.polynomial import polyfit, polyval
from numpy import interp, ndarray, piecewise
def interp1d(x: ndarray, xp, fp):
"""1D piecewise linear interpolation with linear extrapolation."""
return piecewise(
x,
[x < xp[0], (x >= xp[0]) & (x <= xp[-1]), x > xp[-1]],
[
lambda xi: polyval(xi, polyfit(xp[:2], fp[:2], 1)),
lambda xi: interp(xi, xp, fp),
lambda xi: polyval(xi, polyfit(xp[-2:], fp[-2:], 1)),
],
)
This uses plain numpy.interp
for interpolation, reverts to a linear polynomial fit to extrapolate out-of-bounds values, and uses numpy.piecewise
to string them together.
Instead of polyval(..., polyfit(...))
, you could also write the linear extrapolation functions yourself, for example:
lambda xi: fp[0] + np.diff(fp[:2]) / np.diff(xp[:2]) * (xi - xp[0])
and so on.
Upvotes: 0
Reputation: 1215
After discussing with you in the Python chat - you're fitting your data to an exponential. This should give a relatively good indicator since you're not looking for long term extrapolation.
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
def exponential_fit(x, a, b, c):
return a*np.exp(-b*x) + c
if __name__ == "__main__":
x = np.array([0, 1, 2, 3, 4, 5])
y = np.array([30, 50, 80, 160, 300, 580])
fitting_parameters, covariance = curve_fit(exponential_fit, x, y)
a, b, c = fitting_parameters
next_x = 6
next_y = exponential_fit(next_x, a, b, c)
plt.plot(y)
plt.plot(np.append(y, next_y), 'ro')
plt.show()
The red dot in the on far right axis shows the next "predicted" point.
Upvotes: 8
Reputation: 19547
You can also use numpy's polyfit:
data = np.array([[1,5], [2,10], [3,15], [4,20], [5,25]])
fit = np.polyfit(data[:,0], data[:,1] ,1) #The use of 1 signifies a linear fit.
fit
[ 5.00000000e+00 1.58882186e-15] #y = 5x + 0
line = np.poly1d(fit)
new_points = np.arange(5)+6
new_points
[ 6, 7, 8, 9, 10]
line(new_points)
[ 30. 35. 40. 45. 50.]
This allows you to alter the degree of the polynomial fit quite easily as the function polyfit
take thes following arguments np.polyfit(x data, y data, degree)
. Shown is a linear fit where the returned array looks like fit[0]*x^n + fit[1]*x^(n-1) + ... + fit[n-1]*x^0
for any degree n
. The poly1d
function allows you turn this array into a function that returns the value of the polynomial at any given value x
.
In general extrapolation without a well understood model will have sporadic results at best.
Exponential curve fitting.
from scipy.optimize import curve_fit
def func(x, a, b, c):
return a * np.exp(-b * x) + c
x = np.linspace(0,4,5)
y = func(x, 2.5, 1.3, 0.5)
yn = y + 0.2*np.random.normal(size=len(x))
fit ,cov = curve_fit(func, x, yn)
fit
[ 2.67217435 1.21470107 0.52942728] #Variables
y
[ 3. 1.18132948 0.68568395 0.55060478 0.51379141] #Original data
func(x,*fit)
[ 3.20160163 1.32252521 0.76481773 0.59929086 0.5501627 ] #Fit to original + noise
Upvotes: 14
Reputation: 368934
Using scipy.interpolate.splrep
:
>>> from scipy.interpolate import splrep, splev
>>> d = {1:5, 2:10, 3:15, 4:20, 5:25}
>>> x, y = zip(*d.items())
>>> spl = splrep(x, y, k=1, s=0)
>>> splev(6, spl)
array(30.0)
>>> splev(7, spl)
array(35.0)
>>> int(splev(7, spl))
35
>>> splev(10000000000, spl)
array(50000000000.0)
>>> int(splev(10000000000, spl))
50000000000L
See How to make scipy.interpolate give an extrapolated result beyond the input range?
Upvotes: 0
Reputation: 69172
Since your data is approximately linear you can do a linear regression, and then use the results from that regression to calculate the next point, using y = w[0]*x + w[1]
(keeping the notation from the linked example for y = mx + b).
If your data is not approximately linear and you don't have some other theoretical form for a regression, then general extrapolations (using say polynomials or splines) are much less reliable as they can go a bit crazy beyond the known data points. For example, see the accepted answer here.
Upvotes: 1