Applying a function for a multi-index dataframe

Question

I have a multi-index dataframe that is indexed by date and stock. Here is an example:

                     Column1
Date        Ticker
2008-08-01  AAPL     value1
            MSFT     value2
            IBM      value3
            etc.
2008-08-02  AAPL     value4
            MSFT     value5
            IBM      value6
            etc.

What I am trying to do is apply a function to this column for each date, and then replace that column. For example, say for each date, starting with 2008-08-01, I want to the average of Column1 and take the difference between each ticker's value and this average. And then Column1 will be replaced by these values. I'd end up having:

                     Column1
Date        Ticker
2008-08-01  AAPL     avg(Column1, 8/1/08) - value1
            MSFT     avg(Column1, 8/1/08) - value2
            IBM      avg(Column1, 8/1/08) - value3
            etc.
2008-08-02  AAPL     avg(Column1, 8/2/08) - value4
            MSFT     avg(Column1, 8/2/08) - value5
            IBM      avg(Column1, 8/2/08) - value6
            etc.

I can achieve this by doing:

df = df.copy()
col1_adjusted = [func(df.loc[df.index.get_level_values('Date') == date]['Column1']) for date in dates]
col1_adjusted = [item for sublist in col1_adjusted for item in sublist]
df.Column1= col1_adjusted

But that takes a long time because I have a lot of dates. Is there a better way to do this?

ZaxR · Accepted Answer

Debasish is right about the split-apply-combine approach. For a simple example:

# Use groupby to find the average by date; sub in w/e func for mean    
df['Column2'] = df.groupby(level='Date').transform('mean')

# Replace Column1 with the desired value
df['Column1'] = df['Column2'] - df['Column1']

The above example was for clarity, but here is the better, one-liner:

df['Column1'] = df.groupby(level='Date')['Column1'].transform('mean').sub(df['Column1'])

Applying a function for a multi-index dataframe

Answers (1)

Related Questions