BLL27
BLL27

Reputation: 951

Extrapolating data from a curve using Python

I am trying to extrapolate future data points from a data set that contains one continuous value per day for almost 600 days. I am currently fitting a 1st order function to the data using numpy.polyfit and numpy.poly1d. In the graph below you can see the curve (blue) and the 1st order function (green). The x-axis is days since beginning. I am looking for an effective way to model this curve in Python in order to extrapolate future data points as accurately as possible. A linear regression isnt accurate enough and Im unaware of any methods of nonlinear regression that can work in this instance.

This solution isnt accurate enough as if I feed enter image description here

x = dfnew["days_since"]
y = dfnew["nonbrand"]

z = numpy.polyfit(x,y,1)
f = numpy.poly1d(z)

x_new = future_days
y_new = f(x_new)

plt.plot(x,y, '.', x_new, y_new, '-')

EDIT:

I have now tried the curve_fit using a logarithmic function as the curve and data behaviour seems to conform to:

def func(x, a, b):
  return a*numpy.log(x)+b

x = dfnew["days_since"]
y = dfnew["nonbrand"]

popt, pcov = curve_fit(func, x, y)

plt.plot( future_days, func(future_days, *popt), '-')

However when I plot it, my Y-values are way off:

enter image description here

Upvotes: 2

Views: 3526

Answers (1)

Steve Barnes
Steve Barnes

Reputation: 28405

The very general rule of thumb is that if your fitting function is not fitting well enough to your actual data then either:

  • You are using the function wrong, e.g. You are using 1st order polynomials - So if you are convinced that it is a polynomial then try higher order polynomials.
  • You are using the wrong function, it is always worth taking a look at:

    1. your data curve &
    2. what you know about the process that is generating the data

    to come up with some speculation/theorem/guesses about what sort of model might fit better.

Might your process be a logarithmic one, a saturating on, etc. try them!

Finally, if you are not getting a consistent long term trend then you might be able to justify using cubic splines.

Upvotes: 1

Related Questions