MysterioProgrammer91
MysterioProgrammer91

Reputation: 569

Adjusting weights based on share prices for a investment portfolio using pandas

I have share prices for companies in a portfolio and my goal is to create new column df['Final_weights'] yet keep the sum of weights for each date and category the same for df['weights'] and df['final_weights'].

I want to give 0 to weights for companies which have a share price in the bottom 30 percentile for a particular day against companies in the same category, and I want to give higher weight to companies which have a share price over 70% percentile for a particular day against companies in the same category.

I have a dataframe with multiple dates and companies with respective category:

Eg subset of df:

 Date       Category    Company    Price    weight
1/1/2007    Automative  Audi        1000    0.146
1/1/2007    Automative  Alfa Romeo  400     0.143
1/1/2007    Automative  Aston Martin500     0.002
1/1/2007    Automative  Bentley     2000    0.025
1/1/2007    Automative  Mercedes    3000    0.063
1/1/2007    Automative  BMW          40     0.154
1/1/2007    Automative  Volvo       3000    0.163
1/1/2007    Automative  VW           200    0.003
1/1/2007    Technology  Apple        400    0.120
1/1/2007    Technology  Microsoft   5500    0.048
1/1/2007    Technology  Google       230    0.069
1/1/2007    Technology  Lenova        36    0.036
1/1/2007    Technology  IBM          250    0.016
1/1/2007    Technology  Sprint       231    0.013

OK now I have written some code which creates a new column giving the percentile rank of each company per date and per category. The code looks like this:

df['Pctile'] = df.Price.groupby([df.index, df.Category]).rank(pct='True')

Output:

            Category       Company  Price  weight    Pctile
Date                                                       
1/1/2007  Automative          Audi   1000   0.146  0.625000
1/1/2007  Automative    Alfa Romeo    400   0.143  0.375000
1/1/2007  Automative  Aston Martin    500   0.002  0.500000
1/1/2007  Automative       Bentley   2000   0.025  0.750000
1/1/2007  Automative      Mercedes   3000   0.063  0.937500
1/1/2007  Automative           BMW     40   0.154  0.125000
1/1/2007  Automative         Volvo   3000   0.163  0.937500
1/1/2007  Automative            VW    200   0.003  0.250000
1/1/2007  Technology         Apple    400   0.120  0.833333
1/1/2007  Technology     Microsoft   5500   0.048  1.000000
1/1/2007  Technology        Google    230   0.069  0.333333
1/1/2007  Technology        Lenova     36   0.036  0.166667
1/1/2007  Technology           IBM    250   0.016  0.666667
1/1/2007  Technology        Sprint    231   0.013  0.500000

Now I want a final column called df['Final_weight'].

All I want to do is for each date and category are these 3 things,

  1. When df['Pctile'] is <0.3 I want df['Final_weight'] = 0.
  2. When df['Pctile'] is >= 0.3 and <= 0.7 then df['Final_weight'] = df['weight'].
  3. When df['PCtile'] >0.7 = (weight / sum of weights above 70%pctile) *(sum of weights above 70%pctile + sum of weights below 30%pctile)

Here is some sample output and example calculations:

1) sum of weights above 70%pctile = 0.251 2) sum of weights below 30%pctile = 0.157

Calculation for Bentley = 0.025 / 0.251 * (0.251 + 0.157) = 0.041

Calculation for Mercedes = 0.063 / 0.251 * (0.251 + 0.157) = 0.102

Calculation for Volvo = 0.163 / 0.251 * (0.251 + 0.157) = 0.265

Now the sums of weight and final_weight for Automative for 1/1/2007 are the same. They both sum to 0.699.

1) sum of weights above 70%pctile = 0.168 2) sum of weights below 30%pctile = 0.036

Calculation for Apple = 0.120 / 0.168 * (0.168 + 0.036) = 0.146

Calculation for Microsoft = 0.048 / 0.168 * (0.168 + 0.036) = 0.058

Now the sums of weight and final_weight for Technology for 1/1/2007 are the same. They both sum to 0.302. Also the sum of that date also remains 1.

Eg output:

            Category       Company  Price  weight    Pctile  Final_weight
Date                                                       
1/1/2007  Automative          Audi   1000   0.146  0.625000  0.146
1/1/2007  Automative    Alfa Romeo    400   0.143  0.375000  0.143
1/1/2007  Automative  Aston Martin    500   0.002  0.500000  0.002
1/1/2007  Automative       Bentley   2000   0.025  0.750000  0.041
1/1/2007  Automative      Mercedes   3000   0.063  0.937500  0.102
1/1/2007  Automative           BMW     40   0.154  0.125000  0.000
1/1/2007  Automative         Volvo   3000   0.163  0.937500  0.265
1/1/2007  Automative            VW    200   0.003  0.250000  0
1/1/2007  Technology         Apple    400   0.120  0.833333  0.146
1/1/2007  Technology     Microsoft   5500   0.048  1.000000  0.058
1/1/2007  Technology        Google    230   0.069  0.333333  0.069
1/1/2007  Technology        Lenova     36   0.036  0.166667  0.000
1/1/2007  Technology           IBM    250   0.016  0.666667  0.016
1/1/2007  Technology        Sprint    231   0.013  0.500000  0.013

My data is large, with many categories, dates, companies. Would love to see an efficient way of programming this. Thanks for the help.

Upvotes: 1

Views: 599

Answers (1)

Little Bobby Tables
Little Bobby Tables

Reputation: 4744

Whilst I wish this was a groupby-of-a-groupby solution it isn't. It is a bit of a dirty hack. The reason I could not do the groupby solution is because, as far as I am aware, there is no way of selecting and passing columns into multiple argument functions with a groupby. Enough of what cannot be done...

Now I said it was hacky so give it a go on your data set. I dont know how fast it is on a large data set but do let me know.

import pandas as pd

#make a lazy example
date = ['1/1/2017']*10
category = ['car']*5 + ['tech']*5
company = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
price = [10, 300, 100, 400, 500, 230, 324, 543, 234, 124]
weight = [0.2, 0.1, 0.3, 0.2, 0.2, 0.15, 0.15, 0.4, 0.1, 0.2]

data = {'date': date, 'category': category, 'company': company, 'price': price, 'weight': weight}
df = pd.DataFrame(data)

#do you percentile thing
df['pctile'] = df.price.groupby([df.date, df.category]).rank(pct='True')

# define a function?
def seventy_thirty(df):
    s = df.ix[df.pctile > 0.7, 'pctile']
    s.ix[:] = 'upper'
    l = df.ix[df.pctile < 0.3, 'pctile']
    l.ix[:] = 'lower'
    s = s.append(l)
    return s

df['pctile_summary'] = seventy_thirty(df)

# created a dataframe with weights the we can merge make into another column
weighted = df.groupby(['date', 'category', 'pctile_summary']).sum().ix[:, ['weight']]

# add lowers onto uppers as we'll need them in final_weights
add_lower = weighted.ix[weighted.index.get_level_values('pctile_summary')=='lower', ['weight']].reset_index(level=2)
add_lower.pctile_summary = 'upper'
add_lower = add_lower.set_index('pctile_summary', append=True)
weighted = pd.merge(weighted, add_lower, how='left', left_index=True, right_index=True, suffixes=['', '_lower'])

# Now add all new columns and calculate the final_weight
df1 = pd.merge(df, weighted.reset_index(), how='left', on=['date', 'category', 'pctile_summary'], suffixes=['', '_sum'])
df1.ix[df1.pctile_summary=='lower', 'final_weight'] = 0
df1.ix[df1.pctile_summary.isnull(), 'final_weight'] = df1.weight
df1.ix[df1.pctile_summary=='upper', 'final_weight'] = (df1.weight / df1.weight_sum) * (df1.weight_sum + df1.weight_lower)

#finally tidy up (delete all that hardwork!)
df1 = df1.drop(['pctile_summary', 'weight_sum', 'weight_lower'], axis=1)
df1

Upvotes: 1

Related Questions