Reputation: 345
I have the following dataframe which shows data from Motion Capture, where each column is a marker (i.e. position data) and rows are time:
LTHMB X RTHMB X
0 932.109 872.921
1 934.605 873.798
2 932.383 873.998
3 940.946 875.609
4 941.549 875.875
... ... ...
14765 NaN 602.700
14766 562.350 NaN
14767 562.394 NaN
14768 562.421 NaN
14769 562.490 602.705
In the data, there are some NaN values that I need to fill. I'm not really an expert in this so I'm not sure what is the best way to fill these.
I know I can do forward/backward fill, and I also read about spline interpolation, which seems more sophisticated. In the documentation for pandas.DataFrame.interpolate it states that for spline you have to specify the order.
What would I use for the order in this case? Each marker has an X, Y and Z. Does that mean I'd use a cubic spline, or is it not that simple?
Upvotes: 2
Views: 8265
Reputation: 19322
The order of spline has nothing to do with the number of features that you have in the dataset. Each feature will be interpolated independently to each other. Before applying an algorithm it is therefore important to understand how it works and what each of its parameters (such as 'order') contributes towards.
For intuition, a cubic (order = 3) spline is the process of constructing a spline which consists of "piecewise" polynomials of degree three.
Note that all polynomials are just valid within an interval; they compose the interpolation function. While extrapolation predicts a development outside the range of the data, interpolation works just within the data boundaries.
The "order" of the spline is the order of these "piecewise" polynomials.
As you can see, a linear spline (order=1) fits degree one polynomials (straight ines) between the ranges, while a 7th order Spline fits 7th order polynomials.
Which should you use?
No one can simply tell you which would be a better fit. You will have to visualize it to see if a specific interpolation technique is able to give you a relevant imputation or not.
The only way you can guarantee that you are using the right interpolation technique is by comparing them with R2_score. You can do the following -
You can find this approach implemented roughtly here
Upvotes: 6