Reputation: 37
I am passing a dictionary of pandas DataFrames (or a pandas panel) into the function below in order convert from daily to monthly data. Each DataFrame represents a field (eg Open, High, Low or Close) in datetime v stock code space. The function works fine but I am getting deprecation warnings. I can't find an efficient implementation using the newly refactored resample() though. Most of the examples I can find use the agg() function to apply different methods to different columns of a single dataframe. My panel has a separate frame for each field though so this doesn't quite fit. I've tried using apply(lambda) and it works but is unreasonably slow. I'm sure there is an efficient implementation for this. I've noticed several questions that have been answered based on the deprecated implementation and a similar question to mine that has not yet been answered.
Here's my original function:
# function to convert daily data to monthly and return dictionary or panel
def to_monthly(fields, data_d, create_panel=True):
how_dict={'Open':'first', 'High':'max', 'Low':'min', 'Close':'last'}
data_m={}
for field in fields:
data_m[field]=data_d[field].resample(rule='M', how=how_dict[field]).ffill()
if create_panel:
data_m = pd.Panel(data_m)
return data_m
This runs fine but I get the deprecation warnings. My attempt to solve this is:
# alternative function to handle refactoring of .resample()
def to_monthly(fields, data_d, create_panel=True):
how_dict={
'Open': (lambda x: x[0]),
'High': (lambda x: x.max()),
'Low': (lambda x: x.min()),
'Close': (lambda x: x[-1]),
'Volume': lambda x: x.sum()
}
data_m={}
for field in fields:
data_m[field]=data_d[field].resample('M').apply(how_dict[field]).ffill()
if create_panel:
data_m = pd.Panel(data_m)
return data_m
I haven't found it easy to locate a replacement syntax for all the old "how" options. Some assistance on this would also be appreciated. The Pandas documentation doesn't always seem to provide all the options under a given field or usage. I've seen others have had similar problems.
Any help would be greatly appreciated
Thank you
Upvotes: 1
Views: 92
Reputation: 29711
You can use Resampler.aggregate
and pass a dict of the Column names as keys with it's respective intended operation as the values.
dict_ohlcv = {'Open':'first', 'High':'max', 'Low':'min', 'Close':'last', 'Volume':'sum'}
data_m[field]=data_d[field].resample('M')
.agg(dict_ohlcv[field])
.ffill()
The deprecation warnings
you get is due to the fact that API breaking change to the .resample
method to make it more .groupby
like.[source: Resample API
]
Upvotes: 0