Nicolai B. Thomsen
Nicolai B. Thomsen

Reputation: 884

Prevent negative values in df.interpolate()

I'm having troubles with avoiding negative values in interpolation. I have the following data in a DataFrame:

current_country = 

idx Country     Region              Rank    Score     GDP capita    Family   Life Expect.    Freedom    Trust Gov.  Generosity  Residual    Year

289 South Sudan Sub-Saharan Africa  143     3.83200     0.393940    0.185190    0.157810    0.196620    0.130150    0.258990    2.509300    2016
449 South Sudan Sub-Saharan Africa  147     3.59100     0.397249    0.601323    0.163486    0.147062    0.116794    0.285671    1.879416    2017
610 South Sudan Sub-Saharan Africa  154     3.25400     0.337000    0.608000    0.177000    0.112000    0.106000    0.224000    1.690000    2018
765 South Sudan Sub-Saharan Africa  156     2.85300     0.306000    0.575000    0.295000    0.010000    0.091000    0.202000    1.374000    2019

And I want to interpolate the following year (2019) - shown below - using pandas' df.interpolate()

new_row =

idx Country     Region              Rank    Score   GDP capita  Family     Life Expect.  Freedom    Trust Gov.  Generosity  Residual    Year

593 South Sudan Sub-Saharan Africa  0       np.nan  np.nan      np.nan     np.nan        np.nan     np.nan      np.nan      np.nan      2015

I create the df containing null values in all columns to be interpolated (as above) and append that one to the original dataframe before I interpolate to populate the cells with NaNs.

interpol_subset = current_country.append(new_row)
interpol_subset = interpol_subset.interpolate(method = "pchip", order = 2)

This produces the following df

idx Country     Region              Rank    Score     GDP capita    Family   Life Expect.    Freedom    Trust Gov.  Generosity  Residual    Year

289 South Sudan Sub-Saharan Africa  143     3.83200     0.393940    0.185190    0.157810    0.196620    0.130150    0.258990    2.509300    2016
449 South Sudan Sub-Saharan Africa  147     3.59100     0.397249    0.601323    0.163486    0.147062    0.116794    0.285671    1.879416    2017
610 South Sudan Sub-Saharan Africa  154     3.25400     0.337000    0.608000    0.177000    0.112000    0.106000    0.224000    1.690000    2018
765 South Sudan Sub-Saharan Africa  156     2.85300     0.306000    0.575000    0.295000    0.010000    0.091000    0.202000    1.374000    2019
4   South Sudan Sub-Saharan Africa  0       2.39355     0.313624    0.528646    0.434473   -0.126247    0.072480    0.238480    0.963119    2015

The issue: In the last row, the value in "Freedom" is negative. Is there a way to parameterize the df.interpolate function such that it doesn't produce negative values? I can't find anything in the documentation. I'm fine with the estimates besides that negative value (Although they're a bit skewed)

I considered simply flipping the negative to a positive, but the "Score" value is a sum of all the other continuous features and I would like to keep it that way. What can I do here?

Here's a link to the actual code snippet. Thanks for reading.

Upvotes: 2

Views: 824

Answers (1)

Pei Li
Pei Li

Reputation: 320

I doubt this is an issue for interpolation. The main reason is the method you were using. 'pchip' will return a negative value for the 'freedom' anyway. If we take the values from your dataframe:

import numpy as np
import scipy.interpolate

y = np.array([0.196620, 0.147062, 0.112000, 0.010000])
x = np.array([0, 1, 2, 3])
pchip_obj = scipy.interpolate.PchipInterpolator(x, y)
print(pchip_obj(4))

The result is -0.126. I think if you want a positive result you should better change the method you are using.

Upvotes: 1

Related Questions