Reputation: 235
I am trying to apply a basic spline function on all rows of a given dataframe (dfTest, which contains values for vector x) to obtain a bigger one (dfBigger) that would contain all values for vector xnew(which contains x).
I therefore define the following variables:
import pandas as pd
import numpy as np
x = [0,1,3,5]
xnew = range(0,6)
np.random.seed(123)
dfTest = pd.DataFrame(np.random.rand(12).reshape(3,4))
and the basic spline function :
def spline(y, x , xnew):
from scipy import interpolate
model = interpolate.splrep(x,y, s=0.)
ynew = interpolate.splev(xnew,model)
result = ynew.round(3)
return result
which seems to work:
spline(dfTest.iloc[0],x,xnew)
Out[176]: array([ 0.696, 0.286, 0.161, 0.227, 0.388, 0.551])
but when I try to apply it on all rows using :
dfBigger = dfTest.apply(lambda row : spline(row, x, xnew), axis = 1)
I got this :
ValueError: Shape of passed values is (3, 6), indices imply (3, 4)
as dfBigger size is not defined anywhere I cannot see what is wrong. Any help and/or comment about this code would be appreciated.
Upvotes: 1
Views: 2187
Reputation: 879501
df.apply(func)
tries to build a new Series or DataFrame out of the values
returned by func
. The shape of the Series or DataFrame depends on the kind of
value returned by func
. To get a better handle on how df.apply
behaves,
experiment with the following calls:
dfTest.apply(lambda row: 1, axis=1) # Series
dfTest.apply(lambda row: [1], axis=1) # Series
dfTest.apply(lambda row: [1,2], axis=1) # Series
dfTest.apply(lambda row: [1,2,3], axis=1) # Series
dfTest.apply(lambda row: [1,2,3,4], axis=1) # Series
dfTest.apply(lambda row: [1,2,3,4,5], axis=1) # Series
dfTest.apply(lambda row: np.array([1]), axis=1) # DataFrame
dfTest.apply(lambda row: np.array([1,2]), axis=1) # ValueError
dfTest.apply(lambda row: np.array([1,2,3]), axis=1) # ValueError
dfTest.apply(lambda row: np.array([1,2,3,4]), axis=1) # DataFrame!
dfTest.apply(lambda row: np.array([1,2,3,4,5]), axis=1) # ValueError
dfTest.apply(lambda row: pd.Series([1]), axis=1) # DataFrame
dfTest.apply(lambda row: pd.Series([1,2]), axis=1) # DataFrame
dfTest.apply(lambda row: pd.Series([1,2,3]), axis=1) # DataFrame
dfTest.apply(lambda row: pd.Series([1,2,3,4]), axis=1) # DataFrame
dfTest.apply(lambda row: pd.Series([1,2,3,4,5]), axis=1) # DataFrame
So what rules can we draw from these experiments?
func
returns a scalar or a list, df.apply(func)
returns a Series.func
returns a Series, df.apply(func)
returns a DataFrame.func
returns a 1D NumPy array, and the array has only one element, df.apply(func)
returns a DataFrame. (not a terribly useful case...)func
returns a 1D NumPy array, and the array has the same number of elements as df
has columns, df.apply(func)
returns a DataFrame. (useful, but limited)Since func
returns 6 values, and you want a DataFrame as the result,
the solution is to have func
return a Series instead of a NumPy array:
def spline(y, x, xnew):
...
return pd.Series(result)
import numpy as np
import pandas as pd
from scipy import interpolate
def spline(y, x, xnew):
model = interpolate.splrep(x,y, s=0.)
ynew = interpolate.splev(xnew,model)
result = ynew.round(3)
return pd.Series(result)
x = [0,1,3,5]
xnew = range(0,6)
np.random.seed(123)
dfTest = pd.DataFrame(np.random.rand(12).reshape(3,4))
# spline(dfTest.iloc[0],x,xnew)
dfBigger = dfTest.apply(lambda row : spline(row, x, xnew), axis=1)
print(dfBigger)
yields
0 1 2 3 4 5
0 0.696 0.286 0.161 0.227 0.388 0.551
1 0.719 0.423 0.630 0.981 1.119 0.685
2 0.481 0.392 0.333 0.343 0.462 0.729
Upvotes: 7