Fede
Fede

Reputation: 1716

Apply Function to MultiIndex DataFrame by Groupby

I want to groupby('Ticker') this multiIndex Dataframe and then Apply a function that returns a Series for each ticker and add the result to a new column on the df.

def Indicator(dataf):

    df = dataf.copy()
    df['TR1'] = df.High.sub(df.Low)
    df['TR2'] = abs(df.High.sub(df.Close.shift(1)))
    df['TR3'] = abs(df.Low.sub(df.Close.shift(1)))
    df['TR'] = df[['TR1', 'TR2', 'TR3']].max(axis=1)
    df['TR_mean'] = df['TR'].resample('M').mean().shift(1).resample('D').fillna('bfill')
    df['Vol_mean'] = df['Volume'].resample('M').mean().shift(1).resample('D').fillna('bfill')
    indicator = (df.TR.div(df.TR_mean)).div(df.Volume.div(df.Vol_mean))

    return indicator

I try something Like this:

tickers.groupby('Ticker').apply(Indicator)

But I get this error: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'MultiIndex'

Dataframe:

                        Close           High         Low               Open         Volume
Date        Ticker                  
2010-01-04  AAPL        6048.299805    6048.299805  5974.430176 5975.520020 1.043444e+08
            GOOG        1132.989990    1133.869995  1116.560059 1116.560059 3.991400e+09
            TSM         10654.79003    10694.49023  10608.13948 10609.33984 1.044000e+05
2010-01-05  AAPL        6031.859863    6058.020020  6015.669922 6043.939941 1.175721e+08
            GOOG        1132.989990    1133.869995  1116.560059 1116.560059 3.991400e+09
            TSM         10654.79003    10694.49023  10608.13948 10609.33984 1.044000e+05

Upvotes: 1

Views: 243

Answers (1)

MkWTF
MkWTF

Reputation: 1372

In order to fix that error, you just need to add the following line in the Indicator function, after the copy operation:

df.index = df.index.get_level_values(0)

The problem indeed is due to the fact you pass a MultiIndex instead of a DateTime index to the resample method in the function (which is a function to work with time series). What the extra line is doing is basically replacing the MultiIndex by just the DateTime part of the index. This results in the following:

>>> df_orig
                          Close          High           Low          Open        Volume
Date       Ticker
2010-01-04 AAPL     6048.299805   6048.299805   5974.430176   5975.520020  1.043444e+08
           GOOGL    1132.989990   1133.869995   1116.560059   1116.560059  3.991400e+09
           TSM     10654.790030  10694.490230  10608.139480  10609.339840  1.044000e+05
2010-01-05 AAPL     6031.859863   6058.020020   6015.669922   6043.939941  1.175721e+08
           GOOGL    1132.989990   1133.869995   1116.560059   1116.560059  3.991400e+09
           TSM     10654.790030  10694.490230  10608.139480  10609.339840  1.044000e+05

>>> df_orig.info()
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 6 entries, (2010-01-04 00:00:00, AAPL) to (2010-01-05 00:00:00, TSM)
Data columns (total 5 columns):
Close     6 non-null float64
High      6 non-null float64
Low       6 non-null float64
Open      6 non-null float64
Volume    6 non-null float64
dtypes: float64(5)
memory usage: 410.0+ bytes

>>> df_orig.groupby("Ticker").apply(Indicator)
Date    2010-01-04  2010-01-05
Ticker
AAPL           NaN         NaN
GOOGL          NaN         NaN
TSM            NaN         NaN

Of course you can also remove the Ticker column prior to the groupby-apply part like this:

ticker_idx = df_orig.index.get_level_values(1)
df_orig.reset_index(1, drop=True).groupby(ticker_idx).apply(Indicator)

This way you don't need to add that extra line in the function.

Also, I'm getting a bunch of NaN as result of the groupby-apply operation, but by looking at your function's code, I assume that's due to the fact the function was expecting more data then 2 days. Let me know if this is correct.

Upvotes: 1

Related Questions