Reputation: 55
I have a multi-index dataframe that is indexed by date and stock. Here is an example:
Column1
Date Ticker
2008-08-01 AAPL value1
MSFT value2
IBM value3
etc.
2008-08-02 AAPL value4
MSFT value5
IBM value6
etc.
What I am trying to do is apply a function to this column for each date, and then replace that column. For example, say for each date, starting with 2008-08-01, I want to the average of Column1 and take the difference between each ticker's value and this average. And then Column1 will be replaced by these values. I'd end up having:
Column1
Date Ticker
2008-08-01 AAPL avg(Column1, 8/1/08) - value1
MSFT avg(Column1, 8/1/08) - value2
IBM avg(Column1, 8/1/08) - value3
etc.
2008-08-02 AAPL avg(Column1, 8/2/08) - value4
MSFT avg(Column1, 8/2/08) - value5
IBM avg(Column1, 8/2/08) - value6
etc.
I can achieve this by doing:
df = df.copy()
col1_adjusted = [func(df.loc[df.index.get_level_values('Date') == date]['Column1']) for date in dates]
col1_adjusted = [item for sublist in col1_adjusted for item in sublist]
df.Column1= col1_adjusted
But that takes a long time because I have a lot of dates. Is there a better way to do this?
Upvotes: 0
Views: 3226
Reputation: 5165
Debasish is right about the split-apply-combine approach. For a simple example:
# Use groupby to find the average by date; sub in w/e func for mean
df['Column2'] = df.groupby(level='Date').transform('mean')
# Replace Column1 with the desired value
df['Column1'] = df['Column2'] - df['Column1']
The above example was for clarity, but here is the better, one-liner:
df['Column1'] = df.groupby(level='Date')['Column1'].transform('mean').sub(df['Column1'])
Upvotes: 1