Reputation: 1132
In the pandas documentation, a number of methods are provided as arguments to pandas.DataFrame.interpolate
including
nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘spline’, ‘barycentric’, ‘polynomial’: Passed to scipy.interpolate.interp1d. These methods use the numerical values of the index. Both ‘polynomial’ and ‘spline’ require that you also specify an order (int), e.g. df.interpolate(method='polynomial', order=5).
‘krogh’, ‘piecewise_polynomial’, ‘spline’, ‘pchip’, ‘akima’, ‘cubicspline’: Wrappers around the SciPy interpolation methods of similar names. See Notes
However, the scipy documentation indicates the following options:
kind str or int, optional Specifies the kind of interpolation as a string or as an integer specifying the order of the spline interpolator to use. The string has to be one of ‘linear’, ‘nearest’, ‘nearest-up’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘previous’, or ‘next’. ‘zero’, ‘slinear’, ‘quadratic’ and ‘cubic’ refer to a spline interpolation of zeroth, first, second or third order; ‘previous’ and ‘next’ simply return the previous or next value of the point; ‘nearest-up’ and ‘nearest’ differ when interpolating half-integers (e.g. 0.5, 1.5) in that ‘nearest-up’ rounds up and ‘nearest’ rounds down. Default is ‘linear’.
The documentation seems wrong since scipy.interpolate.interp1d
does not accept barycentric
or polynomial
as valid methods. I suppose that barycentric
refers to scipy.interpolate.barycentric_interpolate
, but what does polynomial
refer to? I thought it might be equivalent to the piecewise_polynomial
option, but the two give different results.
Also, method=cubicspline
and method=spline, order=3
give different results. What's the difference here?
Upvotes: 1
Views: 1401
Reputation: 59579
The pandas
interpolate method is an amalgamation of interpolation methods coming from different places in the numpy
and scipy
libraries.
Currently all of the code is located in pandas/core/missing.py
.
At a high level it splits the interpolation methods into those that are handled by np.iterp
and others handled by throughout the scipy
library.
# interpolation methods that dispatch to np.interp
NP_METHODS = ["linear", "time", "index", "values"]
# interpolation methods that dispatch to _interpolate_scipy_wrapper
SP_METHODS = ["nearest", "zero", "slinear", "quadratic", "cubic",
"barycentric", "krogh", "spline", "polynomial",
"from_derivatives", "piecewise_polynomial", "pchip",
"akima", "cubicspline"]
Then because the scipy
methods are split across different methods, you can see there are a ton of other wrappers within missing.py
that indicate the scipy method. Most of the methods are passed off to scipy.interpolate.interp1d
; however for a few others there's a dict or other wrapper methods pointing to those specific scipy
methods.
from scipy import interpolate
alt_methods = {
"barycentric": interpolate.barycentric_interpolate,
"krogh": interpolate.krogh_interpolate,
"from_derivatives": _from_derivatives,
"piecewise_polynomial": _from_derivatives,
}
where the doc string of _from_derivatives
within missing.py
indicates:
def _from_derivatives(xi, yi, x, order=None, der=0, extrapolate=False):
"""
Convenience function for interpolate.BPoly.from_derivatives.
...
"""
So TLDR, depending upon the method you specify you wind up directly using one of the following:
numpy.interp
scipy.interpolate.interp1d
scipy.interpolate.barycentric_interpolate
scipy.interpolate.krogh_interpolate
scipy.interpolate.BPoly.from_derivatives
scipy.interpolate.Akima1DInterpolator
scipy.interpolate.UnivariateSpline
scipy.interpolate.CubicSpline
Upvotes: 1