Reputation: 1716
I want to groupby('Ticker') this multiIndex Dataframe and then Apply a function that returns a Series for each ticker and add the result to a new column on the df.
def Indicator(dataf):
df = dataf.copy()
df['TR1'] = df.High.sub(df.Low)
df['TR2'] = abs(df.High.sub(df.Close.shift(1)))
df['TR3'] = abs(df.Low.sub(df.Close.shift(1)))
df['TR'] = df[['TR1', 'TR2', 'TR3']].max(axis=1)
df['TR_mean'] = df['TR'].resample('M').mean().shift(1).resample('D').fillna('bfill')
df['Vol_mean'] = df['Volume'].resample('M').mean().shift(1).resample('D').fillna('bfill')
indicator = (df.TR.div(df.TR_mean)).div(df.Volume.div(df.Vol_mean))
return indicator
I try something Like this:
tickers.groupby('Ticker').apply(Indicator)
But I get this error: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'MultiIndex'
Dataframe:
Close High Low Open Volume
Date Ticker
2010-01-04 AAPL 6048.299805 6048.299805 5974.430176 5975.520020 1.043444e+08
GOOG 1132.989990 1133.869995 1116.560059 1116.560059 3.991400e+09
TSM 10654.79003 10694.49023 10608.13948 10609.33984 1.044000e+05
2010-01-05 AAPL 6031.859863 6058.020020 6015.669922 6043.939941 1.175721e+08
GOOG 1132.989990 1133.869995 1116.560059 1116.560059 3.991400e+09
TSM 10654.79003 10694.49023 10608.13948 10609.33984 1.044000e+05
Upvotes: 1
Views: 243
Reputation: 1372
In order to fix that error, you just need to add the following line in the Indicator
function, after the copy
operation:
df.index = df.index.get_level_values(0)
The problem indeed is due to the fact you pass a MultiIndex instead of a DateTime index to the resample
method in the function (which is a function to work with time series). What the extra line is doing is basically replacing the MultiIndex by just the DateTime part of the index. This results in the following:
>>> df_orig
Close High Low Open Volume
Date Ticker
2010-01-04 AAPL 6048.299805 6048.299805 5974.430176 5975.520020 1.043444e+08
GOOGL 1132.989990 1133.869995 1116.560059 1116.560059 3.991400e+09
TSM 10654.790030 10694.490230 10608.139480 10609.339840 1.044000e+05
2010-01-05 AAPL 6031.859863 6058.020020 6015.669922 6043.939941 1.175721e+08
GOOGL 1132.989990 1133.869995 1116.560059 1116.560059 3.991400e+09
TSM 10654.790030 10694.490230 10608.139480 10609.339840 1.044000e+05
>>> df_orig.info()
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 6 entries, (2010-01-04 00:00:00, AAPL) to (2010-01-05 00:00:00, TSM)
Data columns (total 5 columns):
Close 6 non-null float64
High 6 non-null float64
Low 6 non-null float64
Open 6 non-null float64
Volume 6 non-null float64
dtypes: float64(5)
memory usage: 410.0+ bytes
>>> df_orig.groupby("Ticker").apply(Indicator)
Date 2010-01-04 2010-01-05
Ticker
AAPL NaN NaN
GOOGL NaN NaN
TSM NaN NaN
Of course you can also remove the Ticker
column prior to the groupby-apply part like this:
ticker_idx = df_orig.index.get_level_values(1)
df_orig.reset_index(1, drop=True).groupby(ticker_idx).apply(Indicator)
This way you don't need to add that extra line in the function.
Also, I'm getting a bunch of NaN
as result of the groupby-apply operation, but by looking at your function's code, I assume that's due to the fact the function was expecting more data then 2 days. Let me know if this is correct.
Upvotes: 1