Reputation: 951
I am trying to extrapolate future data points from a data set that contains one continuous value per day for almost 600 days. I am currently fitting a 1st order function to the data using numpy.polyfit
and numpy.poly1d
. In the graph below you can see the curve (blue) and the 1st order function (green). The x-axis is days since beginning. I am looking for an effective way to model this curve in Python in order to extrapolate future data points as accurately as possible. A linear regression isnt accurate enough and Im unaware of any methods of nonlinear regression that can work in this instance.
This solution isnt accurate enough as if I feed
x = dfnew["days_since"]
y = dfnew["nonbrand"]
z = numpy.polyfit(x,y,1)
f = numpy.poly1d(z)
x_new = future_days
y_new = f(x_new)
plt.plot(x,y, '.', x_new, y_new, '-')
EDIT:
I have now tried the curve_fit using a logarithmic function as the curve and data behaviour seems to conform to:
def func(x, a, b):
return a*numpy.log(x)+b
x = dfnew["days_since"]
y = dfnew["nonbrand"]
popt, pcov = curve_fit(func, x, y)
plt.plot( future_days, func(future_days, *popt), '-')
However when I plot it, my Y-values are way off:
Upvotes: 2
Views: 3526
Reputation: 28405
The very general rule of thumb is that if your fitting function is not fitting well enough to your actual data then either:
You are using the wrong function, it is always worth taking a look at:
to come up with some speculation/theorem/guesses about what sort of model might fit better.
Might your process be a logarithmic one, a saturating on, etc. try them!
Finally, if you are not getting a consistent long term trend then you might be able to justify using cubic splines.
Upvotes: 1