rhz
rhz

Reputation: 1132

Pandas interpolation method definitions

In the pandas documentation, a number of methods are provided as arguments to pandas.DataFrame.interpolate including

nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘spline’, ‘barycentric’, ‘polynomial’: Passed to scipy.interpolate.interp1d. These methods use the numerical values of the index. Both ‘polynomial’ and ‘spline’ require that you also specify an order (int), e.g. df.interpolate(method='polynomial', order=5).

‘krogh’, ‘piecewise_polynomial’, ‘spline’, ‘pchip’, ‘akima’, ‘cubicspline’: Wrappers around the SciPy interpolation methods of similar names. See Notes

However, the scipy documentation indicates the following options:

kind str or int, optional Specifies the kind of interpolation as a string or as an integer specifying the order of the spline interpolator to use. The string has to be one of ‘linear’, ‘nearest’, ‘nearest-up’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘previous’, or ‘next’. ‘zero’, ‘slinear’, ‘quadratic’ and ‘cubic’ refer to a spline interpolation of zeroth, first, second or third order; ‘previous’ and ‘next’ simply return the previous or next value of the point; ‘nearest-up’ and ‘nearest’ differ when interpolating half-integers (e.g. 0.5, 1.5) in that ‘nearest-up’ rounds up and ‘nearest’ rounds down. Default is ‘linear’.

The documentation seems wrong since scipy.interpolate.interp1d does not accept barycentric or polynomial as valid methods. I suppose that barycentric refers to scipy.interpolate.barycentric_interpolate, but what does polynomial refer to? I thought it might be equivalent to the piecewise_polynomial option, but the two give different results.

Also, method=cubicspline and method=spline, order=3 give different results. What's the difference here?

Upvotes: 1

Views: 1401

Answers (1)

ALollz
ALollz

Reputation: 59579

The pandas interpolate method is an amalgamation of interpolation methods coming from different places in the numpy and scipy libraries.

Currently all of the code is located in pandas/core/missing.py.

At a high level it splits the interpolation methods into those that are handled by np.iterp and others handled by throughout the scipy library.

# interpolation methods that dispatch to np.interp
NP_METHODS = ["linear", "time", "index", "values"]

# interpolation methods that dispatch to _interpolate_scipy_wrapper
SP_METHODS = ["nearest", "zero", "slinear", "quadratic", "cubic",
              "barycentric", "krogh", "spline", "polynomial",
              "from_derivatives", "piecewise_polynomial", "pchip",
              "akima", "cubicspline"]

Then because the scipy methods are split across different methods, you can see there are a ton of other wrappers within missing.py that indicate the scipy method. Most of the methods are passed off to scipy.interpolate.interp1d; however for a few others there's a dict or other wrapper methods pointing to those specific scipy methods.

from scipy import interpolate

alt_methods = {
    "barycentric": interpolate.barycentric_interpolate,
    "krogh": interpolate.krogh_interpolate,
    "from_derivatives": _from_derivatives,
    "piecewise_polynomial": _from_derivatives,
}

where the doc string of _from_derivatives within missing.py indicates:

def _from_derivatives(xi, yi, x, order=None, der=0, extrapolate=False):
    """
    Convenience function for interpolate.BPoly.from_derivatives.
    ...
    """

So TLDR, depending upon the method you specify you wind up directly using one of the following:

numpy.interp
scipy.interpolate.interp1d
scipy.interpolate.barycentric_interpolate
scipy.interpolate.krogh_interpolate
scipy.interpolate.BPoly.from_derivatives
scipy.interpolate.Akima1DInterpolator
scipy.interpolate.UnivariateSpline
scipy.interpolate.CubicSpline

Upvotes: 1

Related Questions