Pandas - subtracting 2 variable after group by

Question

I have the following Pandas "code"

df=pd.read_csv('option_data.csv')

In [30]:

df.head(9)

Out[30]:
     S    date    E   D     V
0  IBM  1/2/2008  1  10  0.1718
1  IBM  1/2/2008  1  50  0.2144
2  IBM  1/2/2008  1  90  0.2733
3  IBM  1/3/2008  1  10  0.1692
4  IBM  1/3/2008  1  50  0.2081
5  IBM  1/3/2008  1  90  0.2634
6  IBM  1/4/2008  1  10  0.1844
7  IBM  1/4/2008  1  50  0.2283
8  IBM  1/4/2008  1  90  0.2779

I wish to create a variable that is the spread between V when D = 90 and V when D=10. The first value would be (.2733 - .1718) = .1015. Example below:

S   date        E   D      V    Skew
IBM 1/2/2008    1   10  0.1718  
IBM 1/2/2008    1   50  0.2144  0.1015
IBM 1/2/2008    1   90  0.2733  
IBM 1/3/2008    1   10  0.1692  
IBM 1/3/2008    1   50  0.2081  0.0942
IBM 1/3/2008    1   90  0.2634  
IBM 1/4/2008    1   10  0.1844  
IBM 1/4/2008    1   50  0.2283  0.0935
IBM 1/4/2008    1   90  0.2779

the last variable is the change in the "Skew variable" as a percent of V when D=50 for the first Skew number. So, (0.0942-0.1015)/0.2144 = -.03405

S   date        E   D   V       Skew    pct_change_Skew
IBM 1/2/2008    1   10  0.1718      
IBM 1/2/2008    1   50  0.2144  0.1015  
IBM 1/2/2008    1   90  0.2733      
IBM 1/3/2008    1   10  0.1692      
IBM 1/3/2008    1   50  0.2081  0.0942  -0.03405
IBM 1/3/2008    1   90  0.2634      
IBM 1/4/2008    1   10  0.1844      
IBM 1/4/2008    1   50  0.2283  0.0935  -0.00001
IBM 1/4/2008    1   90  0.2779

My setup is below:

df = df.groupby(['S','date','E']).apply(????)

I need to perform the groupby as there are many S (symbols), dates and E values.

Ideally I would have the new variables populate each row as such:

S   date        E   D     V      Skew   pct_change_Skew
IBM 1/2/2008    1   10  0.1718  0.1015  
IBM 1/2/2008    1   50  0.2144  0.1015  
IBM 1/2/2008    1   90  0.2733  0.1015  
IBM 1/3/2008    1   10  0.1692  0.0942  -0.03405
IBM 1/3/2008    1   50  0.2081  0.0942  -0.03405
IBM 1/3/2008    1   90  0.2634  0.0942  -0.03405
IBM 1/4/2008    1   10  0.1844  0.0935  -0.00001
IBM 1/4/2008    1   50  0.2283  0.0935  -0.00001
IBM 1/4/2008    1   90  0.2779  0.0935  -0.00001

If I can be so bold as to ask for a pandas as well as a python "straight" code example, it would help my learning curve. Thank you very much for any assistance in this.

John

Jianxun Li · Accepted Answer

To calculate the new 'Skew' column, you can do a groupby and define your customized apply function. To calculate pct_change, you can use the .shift() operator.

import pandas as pd

df

Out[31]: 
     S      date  E   D       V
0  IBM  1/2/2008  1  10  0.1718
1  IBM  1/2/2008  1  50  0.2144
2  IBM  1/2/2008  1  90  0.2733
3  IBM  1/3/2008  1  10  0.1692
4  IBM  1/3/2008  1  50  0.2081
5  IBM  1/3/2008  1  90  0.2634
6  IBM  1/4/2008  1  10  0.1844
7  IBM  1/4/2008  1  50  0.2283
8  IBM  1/4/2008  1  90  0.2779

def calculate_skew(group):
    group['Skew'] = group.loc[group.D==90, 'V'].values[0] - group.loc[group.D==10, 'V'].values[0]
    return group
# get the new Skew column
df = df.groupby(['S','date']).apply(calculate_skew)
# calculate pct_change
df['Skew_lag3'] = df.Skew.shift(3)
df['Skew_pct_change'] = (df.Skew - df.Skew_lag3)/df.Skew_lag3

Out[33]: 
     S      date  E   D       V    Skew  Skew_lag3  Skew_pct_change
0  IBM  1/2/2008  1  10  0.1718  0.1015        NaN              NaN
1  IBM  1/2/2008  1  50  0.2144  0.1015        NaN              NaN
2  IBM  1/2/2008  1  90  0.2733  0.1015        NaN              NaN
3  IBM  1/3/2008  1  10  0.1692  0.0942     0.1015          -0.0719
4  IBM  1/3/2008  1  50  0.2081  0.0942     0.1015          -0.0719
5  IBM  1/3/2008  1  90  0.2634  0.0942     0.1015          -0.0719
6  IBM  1/4/2008  1  10  0.1844  0.0935     0.0942          -0.0074
7  IBM  1/4/2008  1  50  0.2283  0.0935     0.0942          -0.0074
8  IBM  1/4/2008  1  90  0.2779  0.0935     0.0942          -0.0074

Pandas - subtracting 2 variable after group by

Answers (1)

Related Questions