Ipa
Ipa

Reputation: 139

How can I interpolate values in a python dataframe?

I have a dataframe with 2 columns in python. I want to enter the dataframe with one column and obtain the value of the 2nd column. Sometimes the values can be exact, but they can also be values between 2 rows.

I have this example dataframe:

    x   y
0   0   0
1   10  100
2   20  200

I want to find the value of y if I check the dataframe with the value of x. For example, if I enter in the dataframe with the value of 10, I obtain the value of 100. But if I check with 15, I need to interpolate between the two values of y. Is there any function to do it?

Upvotes: 3

Views: 10627

Answers (2)

Serge Ballesta
Serge Ballesta

Reputation: 149185

numpy.interp is probaly the simplest way here for linear interpolation:

def interpolate(xval, df, xcol, ycol):
# compute xval as the linear interpolation of xval where df is a dataframe and
#  df.x are the x coordinates, and df.y are the y coordinates. df.x is expected to be sorted.
    return np.interp([xval], df[xcol], df[ycol])

With your example data it gives:

>>> interpolate(10, df, 'x', 'y')
>>> 100.0
>>> interpolate(15, df, 'x', 'y')
>>> 150.0

You can even directly do:

>>> np.interp([10, 15], df.x, df.y)
array([100., 150.])

Upvotes: 5

Alexandre B.
Alexandre B.

Reputation: 5500

You can have a look at the interpolate method provided in Pandas module (doc). But I'm not sure that answers your question.

You can do it with interp1d from the sklearn module. Several types of interpolation are possible: ‘linear’, ‘nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’... You find the list at the (doc page).

The interpolation process can be summarised as three steps:

  1. Split your data between missing and non missing values. I use isna (doc)
  2. Create the interpolation function using the data without missing values. I use interp1d (doc)
  3. Interpolate (predict the missing values). Just call the function find in step 2 on the missing data (column x).

Here the code:

# Import modules
import pandas as pd
import numpy as np
from scipy.interpolate import interp1d

# Data
df = pd.DataFrame(
    [[0,   0],
     [10, 100],
     [11, np.NaN],
     [15, np.NaN],
     [17, np.NaN],
     [20,   200]],
    columns=["x", "y"])
print(df)
#     x      y
# 0   0    0.0
# 1  10  100.0
# 2  11    NaN
# 3  15    NaN
# 4  17    NaN
# 5  20  200.0

# Split data in training (not NaN values) and missing (NaN values)
missing = df.isna().any(axis=1)
df_training = df[~missing]
df_missing = df[missing].reset_index(drop=True)

# Create function that interpolate missing value (from our training values)
f = interp1d(df_training.x, df_training.y)

# Interpolate the missing values
df_missing["y"] = f(df_missing.x)
print(df_missing)
#     x      y
# 0  11  110.0
# 1  15  150.0
# 2  17  170.0

You can find others works on the topic at this link.

Upvotes: 2

Related Questions