Reputation: 884
I'm having troubles with avoiding negative values in interpolation. I have the following data in a DataFrame:
current_country =
idx Country Region Rank Score GDP capita Family Life Expect. Freedom Trust Gov. Generosity Residual Year
289 South Sudan Sub-Saharan Africa 143 3.83200 0.393940 0.185190 0.157810 0.196620 0.130150 0.258990 2.509300 2016
449 South Sudan Sub-Saharan Africa 147 3.59100 0.397249 0.601323 0.163486 0.147062 0.116794 0.285671 1.879416 2017
610 South Sudan Sub-Saharan Africa 154 3.25400 0.337000 0.608000 0.177000 0.112000 0.106000 0.224000 1.690000 2018
765 South Sudan Sub-Saharan Africa 156 2.85300 0.306000 0.575000 0.295000 0.010000 0.091000 0.202000 1.374000 2019
And I want to interpolate the following year (2019) - shown below - using pandas' df.interpolate()
new_row =
idx Country Region Rank Score GDP capita Family Life Expect. Freedom Trust Gov. Generosity Residual Year
593 South Sudan Sub-Saharan Africa 0 np.nan np.nan np.nan np.nan np.nan np.nan np.nan np.nan 2015
I create the df containing null values in all columns to be interpolated (as above) and append that one to the original dataframe before I interpolate to populate the cells with NaNs.
interpol_subset = current_country.append(new_row)
interpol_subset = interpol_subset.interpolate(method = "pchip", order = 2)
This produces the following df
idx Country Region Rank Score GDP capita Family Life Expect. Freedom Trust Gov. Generosity Residual Year
289 South Sudan Sub-Saharan Africa 143 3.83200 0.393940 0.185190 0.157810 0.196620 0.130150 0.258990 2.509300 2016
449 South Sudan Sub-Saharan Africa 147 3.59100 0.397249 0.601323 0.163486 0.147062 0.116794 0.285671 1.879416 2017
610 South Sudan Sub-Saharan Africa 154 3.25400 0.337000 0.608000 0.177000 0.112000 0.106000 0.224000 1.690000 2018
765 South Sudan Sub-Saharan Africa 156 2.85300 0.306000 0.575000 0.295000 0.010000 0.091000 0.202000 1.374000 2019
4 South Sudan Sub-Saharan Africa 0 2.39355 0.313624 0.528646 0.434473 -0.126247 0.072480 0.238480 0.963119 2015
The issue: In the last row, the value in "Freedom" is negative. Is there a way to parameterize the df.interpolate function such that it doesn't produce negative values? I can't find anything in the documentation. I'm fine with the estimates besides that negative value (Although they're a bit skewed)
I considered simply flipping the negative to a positive, but the "Score" value is a sum of all the other continuous features and I would like to keep it that way. What can I do here?
Here's a link to the actual code snippet. Thanks for reading.
Upvotes: 2
Views: 824
Reputation: 320
I doubt this is an issue for interpolation. The main reason is the method you were using. 'pchip' will return a negative value for the 'freedom' anyway. If we take the values from your dataframe:
import numpy as np
import scipy.interpolate
y = np.array([0.196620, 0.147062, 0.112000, 0.010000])
x = np.array([0, 1, 2, 3])
pchip_obj = scipy.interpolate.PchipInterpolator(x, y)
print(pchip_obj(4))
The result is -0.126. I think if you want a positive result you should better change the method you are using.
Upvotes: 1