Doggie52
Doggie52

Reputation: 153

Weighted average of time-series with changing weights over time

I need to generate a weighted average of the returns of some number of financial assets, where the weights used change over time. (The application is financial but the problem itself is a very general one).

Excerpts of my returns and weights are as follows:

returns_df: Returns of assets over time

┌──────────────────┬───────────┬───────────┬───────────┐
│     DateTime     │  Asset 1  │  Asset 2  │  Asset 3  │
├──────────────────┼───────────┼───────────┼───────────┤
│ 2015-04-09 07:00 │ -0.000959 │ -0.000207 │ -0.000233 │
│ 2015-04-09 08:00 │ -0.004003 │  0.000169 │  0.001221 │
│ 2015-04-09 09:00 │ -0.000700 │ -0.000070 │ -0.000096 │
│ 2015-04-09 10:00 │ -0.000812 │ -0.000289 │  0.000177 │
│ 2015-04-09 11:00 │ -0.000030 │ -0.000168 │ -0.000315 │
└──────────────────┴───────────┴───────────┴───────────┘

weights_df: Weights of assets over time

┌──────────────────┬─────────┬─────────┬─────────┐
│ Start of weights │ Asset 1 │ Asset 2 │ Asset 3 │
├──────────────────┼─────────┼─────────┼─────────┤
│ 2015-03-01       │       1 │       0 │       0 │
│ 2015-04-01       │   0.023 │  0.8733 │  0.1037 │
│ 2015-05-01       │       1 │       0 │       0 │
│ 2015-06-01       │  0.0477 │  0.8278 │  0.1245 │
└──────────────────┴─────────┴─────────┴─────────┘

For example, the returns in the first table would all be weighted by {0.023; 0.8733; 0.103} as they all fall after 2015-04-01 but before 2015-05-01.

My real data set has returns spanning the entire range of dates in my weights, of course.

I am not sure at all how to approach this, I thought of using groupby() but given that the shape of weights_df is not the same as returns_df, that doesn't seem to work.

import numpy as np
import pandas as pd
from io import StringIO
# alternatively try `import StringIO`

returns_datatext = StringIO("""
    DateTime     │  Asset 1  │  Asset 2  │  Asset 3
2015-04-09 07:00 │ -0.000959 │ -0.000207 │ -0.000233
2015-04-09 08:00 │ -0.004003 │  0.000169 │  0.001221
2015-04-09 09:00 │ -0.000700 │ -0.000070 │ -0.000096
2015-04-09 10:00 │ -0.000812 │ -0.000289 │  0.000177
2015-04-09 11:00 │ -0.000030 │ -0.000168 │ -0.000315
""")
returns_df = pd.read_table(returns_datatext, index_col=[0], parse_dates=True, sep='│')

weights_datatext = StringIO("""
Start of weights │ Asset 1 │ Asset 2 │ Asset 3
2015-03-01       │       1 │       0 │       0
2015-04-01       │   0.023 │  0.8733 │  0.1037
2015-05-01       │       1 │       0 │       0
2015-06-01       │  0.0477 │  0.8278 │  0.1245
""")
weights_df = pd.read_table(weights_datatext, index_col=[0], parse_dates=True, sep='│')

Expected results for the table above would be the following:

┌──────────────────┬──────────────────┐
│     DateTime     │ Weighted average │
├──────────────────┼──────────────────┤
│ 2015-04-09 07:00 │        -0.000227 │
│ 2015-04-09 08:00 │         0.000182 │
│ 2015-04-09 09:00 │        -0.000087 │
│ 2015-04-09 10:00 │        -0.000253 │
│ 2015-04-09 11:00 │        -0.000180 │
└──────────────────┴──────────────────┘

Upvotes: 1

Views: 867

Answers (1)

ALollz
ALollz

Reputation: 59519

This is pd.merge_asof to link the two, and then numpy.average

import pandas as pd
import numpy as np

## Fix whitespace in sample data
#returns_df.index.name = 'DateTime'
#returns_df.columns = ['Asset1', 'Asset2', 'Asset3']
#weights_df.index.name= 'Start of weights'
#weights_df.columns = ['Asset1', 'Asset2', 'Asset3']

df = pd.merge_asof(returns_df, weights_df,
                   left_index=True, right_index=True, 
                   direction='backward',
                   suffixes=['', '_weight'])

cols = ['Asset1', 'Asset2', 'Asset3']
returns_df['weighted_average'] = np.average(df[cols], weights=df[[col + '_weight' for col in cols]], axis=1)

Output: returns_df

                       Asset1    Asset2    Asset3  weighted_average
DateTime                                                           
2015-04-09 07:00:00 -0.000959 -0.000207 -0.000233         -0.000227
2015-04-09 08:00:00 -0.004003  0.000169  0.001221          0.000182
2015-04-09 09:00:00 -0.000700 -0.000070 -0.000096         -0.000087
2015-04-09 10:00:00 -0.000812 -0.000289  0.000177         -0.000253
2015-04-09 11:00:00 -0.000030 -0.000168 -0.000315         -0.000180

For illustration, this is the resulting DataFrame from the merge. The logic chooses the closest date in weights_df which is before the date in returns_df as the merge condition:

                       Asset1    Asset2    Asset3  Asset1_weight  Asset2_weight  Asset3_weight
DateTime                                                                                      
2015-04-09 07:00:00 -0.000959 -0.000207 -0.000233          0.023         0.8733         0.1037
2015-04-09 08:00:00 -0.004003  0.000169  0.001221          0.023         0.8733         0.1037
2015-04-09 09:00:00 -0.000700 -0.000070 -0.000096          0.023         0.8733         0.1037
2015-04-09 10:00:00 -0.000812 -0.000289  0.000177          0.023         0.8733         0.1037
2015-04-09 11:00:00 -0.000030 -0.000168 -0.000315          0.023         0.8733         0.1037

Upvotes: 2

Related Questions