Reputation: 521
I have the following Pandas "code"
df=pd.read_csv('option_data.csv')
In [30]:
df.head(9)
Out[30]:
S date E D V
0 IBM 1/2/2008 1 10 0.1718
1 IBM 1/2/2008 1 50 0.2144
2 IBM 1/2/2008 1 90 0.2733
3 IBM 1/3/2008 1 10 0.1692
4 IBM 1/3/2008 1 50 0.2081
5 IBM 1/3/2008 1 90 0.2634
6 IBM 1/4/2008 1 10 0.1844
7 IBM 1/4/2008 1 50 0.2283
8 IBM 1/4/2008 1 90 0.2779
I wish to create a variable that is the spread between V when D = 90 and V when D=10. The first value would be (.2733 - .1718) = .1015. Example below:
S date E D V Skew
IBM 1/2/2008 1 10 0.1718
IBM 1/2/2008 1 50 0.2144 0.1015
IBM 1/2/2008 1 90 0.2733
IBM 1/3/2008 1 10 0.1692
IBM 1/3/2008 1 50 0.2081 0.0942
IBM 1/3/2008 1 90 0.2634
IBM 1/4/2008 1 10 0.1844
IBM 1/4/2008 1 50 0.2283 0.0935
IBM 1/4/2008 1 90 0.2779
the last variable is the change in the "Skew variable" as a percent of V when D=50 for the first Skew number. So, (0.0942-0.1015)/0.2144 = -.03405
S date E D V Skew pct_change_Skew
IBM 1/2/2008 1 10 0.1718
IBM 1/2/2008 1 50 0.2144 0.1015
IBM 1/2/2008 1 90 0.2733
IBM 1/3/2008 1 10 0.1692
IBM 1/3/2008 1 50 0.2081 0.0942 -0.03405
IBM 1/3/2008 1 90 0.2634
IBM 1/4/2008 1 10 0.1844
IBM 1/4/2008 1 50 0.2283 0.0935 -0.00001
IBM 1/4/2008 1 90 0.2779
My setup is below:
df = df.groupby(['S','date','E']).apply(????)
I need to perform the groupby as there are many S (symbols), dates and E values.
Ideally I would have the new variables populate each row as such:
S date E D V Skew pct_change_Skew
IBM 1/2/2008 1 10 0.1718 0.1015
IBM 1/2/2008 1 50 0.2144 0.1015
IBM 1/2/2008 1 90 0.2733 0.1015
IBM 1/3/2008 1 10 0.1692 0.0942 -0.03405
IBM 1/3/2008 1 50 0.2081 0.0942 -0.03405
IBM 1/3/2008 1 90 0.2634 0.0942 -0.03405
IBM 1/4/2008 1 10 0.1844 0.0935 -0.00001
IBM 1/4/2008 1 50 0.2283 0.0935 -0.00001
IBM 1/4/2008 1 90 0.2779 0.0935 -0.00001
If I can be so bold as to ask for a pandas as well as a python "straight" code example, it would help my learning curve. Thank you very much for any assistance in this.
John
Upvotes: 1
Views: 2071
Reputation: 24742
To calculate the new 'Skew' column, you can do a groupby
and define your customized apply
function. To calculate pct_change, you can use the .shift()
operator.
import pandas as pd
df
Out[31]:
S date E D V
0 IBM 1/2/2008 1 10 0.1718
1 IBM 1/2/2008 1 50 0.2144
2 IBM 1/2/2008 1 90 0.2733
3 IBM 1/3/2008 1 10 0.1692
4 IBM 1/3/2008 1 50 0.2081
5 IBM 1/3/2008 1 90 0.2634
6 IBM 1/4/2008 1 10 0.1844
7 IBM 1/4/2008 1 50 0.2283
8 IBM 1/4/2008 1 90 0.2779
def calculate_skew(group):
group['Skew'] = group.loc[group.D==90, 'V'].values[0] - group.loc[group.D==10, 'V'].values[0]
return group
# get the new Skew column
df = df.groupby(['S','date']).apply(calculate_skew)
# calculate pct_change
df['Skew_lag3'] = df.Skew.shift(3)
df['Skew_pct_change'] = (df.Skew - df.Skew_lag3)/df.Skew_lag3
Out[33]:
S date E D V Skew Skew_lag3 Skew_pct_change
0 IBM 1/2/2008 1 10 0.1718 0.1015 NaN NaN
1 IBM 1/2/2008 1 50 0.2144 0.1015 NaN NaN
2 IBM 1/2/2008 1 90 0.2733 0.1015 NaN NaN
3 IBM 1/3/2008 1 10 0.1692 0.0942 0.1015 -0.0719
4 IBM 1/3/2008 1 50 0.2081 0.0942 0.1015 -0.0719
5 IBM 1/3/2008 1 90 0.2634 0.0942 0.1015 -0.0719
6 IBM 1/4/2008 1 10 0.1844 0.0935 0.0942 -0.0074
7 IBM 1/4/2008 1 50 0.2283 0.0935 0.0942 -0.0074
8 IBM 1/4/2008 1 90 0.2779 0.0935 0.0942 -0.0074
Upvotes: 3