Reputation: 139
I have a dataframe with 2 columns in python. I want to enter the dataframe with one column and obtain the value of the 2nd column. Sometimes the values can be exact, but they can also be values between 2 rows.
I have this example dataframe:
x y
0 0 0
1 10 100
2 20 200
I want to find the value of y if I check the dataframe with the value of x. For example, if I enter in the dataframe with the value of 10, I obtain the value of 100. But if I check with 15, I need to interpolate between the two values of y. Is there any function to do it?
Upvotes: 3
Views: 10627
Reputation: 149185
numpy.interp
is probaly the simplest way here for linear interpolation:
def interpolate(xval, df, xcol, ycol):
# compute xval as the linear interpolation of xval where df is a dataframe and
# df.x are the x coordinates, and df.y are the y coordinates. df.x is expected to be sorted.
return np.interp([xval], df[xcol], df[ycol])
With your example data it gives:
>>> interpolate(10, df, 'x', 'y')
>>> 100.0
>>> interpolate(15, df, 'x', 'y')
>>> 150.0
You can even directly do:
>>> np.interp([10, 15], df.x, df.y)
array([100., 150.])
Upvotes: 5
Reputation: 5500
You can have a look at the interpolate
method provided in Pandas
module (doc). But I'm not sure that answers your question.
You can do it with interp1d
from the sklearn
module. Several types of interpolation are possible: ‘linear’, ‘nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’... You find the list at the (doc page).
The interpolation process can be summarised as three steps:
isna
(doc)interp1d
(doc)x
).Here the code:
# Import modules
import pandas as pd
import numpy as np
from scipy.interpolate import interp1d
# Data
df = pd.DataFrame(
[[0, 0],
[10, 100],
[11, np.NaN],
[15, np.NaN],
[17, np.NaN],
[20, 200]],
columns=["x", "y"])
print(df)
# x y
# 0 0 0.0
# 1 10 100.0
# 2 11 NaN
# 3 15 NaN
# 4 17 NaN
# 5 20 200.0
# Split data in training (not NaN values) and missing (NaN values)
missing = df.isna().any(axis=1)
df_training = df[~missing]
df_missing = df[missing].reset_index(drop=True)
# Create function that interpolate missing value (from our training values)
f = interp1d(df_training.x, df_training.y)
# Interpolate the missing values
df_missing["y"] = f(df_missing.x)
print(df_missing)
# x y
# 0 11 110.0
# 1 15 150.0
# 2 17 170.0
You can find others works on the topic at this link.
Upvotes: 2