Reputation: 341
I have a dataframe with ResidMat and Price, I use scipy to find the interpolate CubicSpline. I used CubicSpline and apply to find all data on my dataset. But it's not very fast, because in this case have no more data. I will have more than a hundred data and it's very slow. Do you have an idea to do that but maybe with a matrix ?
Thank you,
def add_interpolated_price(row, generic_residmat):
from scipy.interpolate import CubicSpline
residmats = row[['ResidMat']].values
prices = row[['Price']].values
cs = CubicSpline(residmats, prices)
return float(cs(generic_residmat))
df = pd.DataFrame([[1,18,38,58,83,103,128,148,32.4,32.5,33.8,33.5,32.8,32.4,32.7],[2,17,37,57,82,102,127,147,31.2,31.5,32.7,33.2,32.5,32.9,33.3]],columns = ['index','ResidMat','ResidMat','ResidMat','ResidMat','ResidMat','ResidMat','ResidMat','Price','Price','Price','Price','Price','Price','Price'],index=['2010-06-25','2010-06-28'])
my_resimmat = 30
df['Generic_Value'] = df.apply(lambda row: add_interpolated_price(row, generic_residmat=my_resimmat), axis=1)
Upvotes: 1
Views: 417
Reputation: 2696
After looking at the profile of this code most of the time is spent in interpolating so the best thing I would suggest is going pandarallel. Make Pandas DataFrame apply() use all cores? has the details. My fave is this method... (outline code below)
from pandarallel import pandarallel
from math import sin
pandarallel.initialize()
def func(x):
return sin(x**2)
df.parallel_apply(func, axis=1)
but this only works on Linux and Macos, on Windows, Pandarallel will work only if the Python session is executed from Windows Subsystem for Linux (WSL).
Upvotes: 1