Reputation: 21
I am trying to use the impliedVolatility
function in df_spx.apply()
while hardcoding the variable inputs S
, K
, r
, price
, T
, payoff
, and c_or_p
.
However, it does not work, using the same function impliedVolatility
, only doing lambda
+ apply
it works.
[code link][1]
# first version of code
S = SPX_spot
K = df_spx['strike_price']
r = df_spx['r']
price = df_spx['mid_price']
T = df_spx['T_years']
payoff = df_spx['cp_flag']
c_or_p = df_spx["cp_flag"]
df_spx["iv"] = df_spx.apply(impliedVolatility(c_or_p, S, K, T, r,price),axis=1)
# second version of code
df_spx["impliedvol"] = df_spx.apply(
lambda r: impliedVolatility(r["cp_flag"],
S,
r["strike_price"],
r['T_years'],
r["r"],
r["mid_price"]),
axis = 1)
[1]: https://i.sstatic.net/yBfO5.png
Upvotes: 1
Views: 103
Reputation: 1490
You have to give apply a function that it can call. It needs a callable function. In your first example
df_spx.apply(impliedVolatility(c_or_p, S, K, T, r,price), axis=1)
you are giving the result of the function as a parameter to apply. That would not work. If you instead wrote
df_spx.apply(impliedVolatility, c_or_p=c_or_p, S=S, K=K, T=T, r=r, price=price, axis=1)
if the function keywords arguments have the same names or if you wrote
df_spx.apply(impliedVolatility, args=(c_or_p, S, K, T, r,price), axis=1)
then it might work. Notice we are not calling the impliedVolatility
in the apply. We are giving the function as a argument.
Upvotes: 3
Reputation: 1432
There is already a pretty good answer, but maybe to give it a different perspective. The apply is going to loop on your data and call the function you provide on it.
Say you have:
import pandas as pd
df = pd.DataFrame({"a": [1, 2, 3], "b": list("asd")})
df
Out:
a b
0 1 a
1 2 s
2 3 d
If you want to create new data or perform certain work on any of the columns (you could also do it at the entire row level, which btw is your use case, but let's simplify for now) you might consider using apply. Say you just wanted to multiply every input by two:
def multiply_by_two(val):
return val * 2
df.b.apply(multiply_by_two) # case 1
Out:
0 aa
1 ss
2 dd
df.a.apply(multiply_by_two) # case 2
Out:
0 2
1 4
2 6
The first usage example transformed your one letter string into two equal letter strings while the second is obvious. You should avoid using apply in the second case, because it is a simple mathematical operation that will be extremely slow in comparison to df.a * 2
. Hence, my rule of thumb is: use apply when performing operations with non-numeric objects (case 1). NOTE: no actual need for a lambda in this simple case.
So what apply does is passing each element of the series to the function.
Now, if you apply
on an entire dataframe, the values passed will be a data slice as a series. Hence, to properly apply your function you will need to map the inputs. For, instance:
def add_2_to_a_multiply_b(b, a):
return (a + 2) * b
df.apply(lambda row: add_2_to_a_multiply_b(*row), axis=1) # ERROR because the values are unpacked as (df.a, df.b) and you can't add integers and strings (see `add_2_to_a_multiply_b`)
df.apply(lambda row: add_2_to_a_multiply_b(row['b'], row['a']), axis=1)
Out:
0 aaa
1 ssss
2 ddddd
From this point on you can build more complex implementation, for instance, using partial
functions, etc. For instance:
def add_to_a_multiply_b(b, a, *, val_to_add):
return (a + val_to_add) * b
import partial
specialized_func = partial(add_to_a_multiply_b, val_to_add=2)
df.apply(lambda row: specialized_func(row['b'], row['a']), axis=1)
Just to stress it again, avoid apply
if you are performance eager:
# 'OK-ISH', does the job... but
def strike_price_minus_mid_price(strike_price, mid_price):
return strike_price - mid_price
new_data = df.apply(lambda r: strike_price_minus_mid_price(r["strike_price"], r["mid_price"] ), axis=1)
vs
'BETTER'
new_data = df["strike_price"] - df["mid_price"]
Upvotes: 1